up
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
api-governance / spectral-lint (push) Has been cancelled
oas-ci / oas-validate (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Policy Simulation / policy-simulate (push) Has been cancelled
SDK Publish & Sign / sdk-publish (push) Has been cancelled

This commit is contained in:
master
2025-11-27 15:05:48 +02:00
parent 4831c7fcb0
commit e950474a77
278 changed files with 81498 additions and 672 deletions

View File

@@ -0,0 +1,11 @@
{
"permissions": {
"allow": [
"Bash(dotnet build:*)",
"Bash(dotnet restore:*)",
"Bash(chmod:*)"
],
"deny": [],
"ask": []
}
}

219
CLAUDE.md Normal file
View File

@@ -0,0 +1,219 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
StellaOps is a self-hostable, sovereign container-security platform released under AGPL-3.0-or-later. It provides reproducible vulnerability scanning with VEX-first decisioning, SBOM generation (SPDX 3.0.1 and CycloneDX 1.6), in-toto/DSSE attestations, and optional Sigstore Rekor transparency. The platform is designed for offline/air-gapped operation with regional crypto support (eIDAS/FIPS/GOST/SM).
## Build Commands
```bash
# Build the entire solution
dotnet build src/StellaOps.sln
# Build a specific module (example: Concelier web service)
dotnet build src/Concelier/StellaOps.Concelier.WebService/StellaOps.Concelier.WebService.csproj
# Run the Concelier web service
dotnet run --project src/Concelier/StellaOps.Concelier.WebService
# Build CLI for current platform
dotnet publish src/Cli/StellaOps.Cli/StellaOps.Cli.csproj --configuration Release
# Build CLI for specific runtime (linux-x64, linux-arm64, osx-x64, osx-arm64, win-x64)
dotnet publish src/Cli/StellaOps.Cli/StellaOps.Cli.csproj --configuration Release --runtime linux-x64
```
## Test Commands
```bash
# Run all tests
dotnet test src/StellaOps.sln
# Run tests for a specific project
dotnet test src/Scanner/__Tests/StellaOps.Scanner.WebService.Tests/StellaOps.Scanner.WebService.Tests.csproj
# Run a single test by filter
dotnet test --filter "FullyQualifiedName~TestMethodName"
# Run tests with verbosity
dotnet test src/StellaOps.sln --verbosity normal
```
**Note:** Tests use Mongo2Go which requires OpenSSL 1.1 on Linux. Run `scripts/enable-openssl11-shim.sh` before testing if needed.
## Linting and Validation
```bash
# Lint OpenAPI specs
npm run api:lint
# Validate attestation schemas
npm run docs:attestor:validate
# Validate Helm chart
helm lint deploy/helm/stellaops
```
## Architecture
### Technology Stack
- **Runtime:** .NET 10 (`net10.0`) with latest C# preview features
- **Frontend:** Angular v17 (in `src/UI/StellaOps.UI`)
- **Database:** MongoDB (driver version ≥ 3.0)
- **Testing:** xUnit with Mongo2Go, Moq, Microsoft.AspNetCore.Mvc.Testing
- **Observability:** Structured logging, OpenTelemetry traces
- **NuGet:** Use the single curated feed and cache at `local-nugets/`
### Module Structure
The codebase follows a monorepo pattern with modules under `src/`:
| Module | Path | Purpose |
|--------|------|---------|
| Concelier | `src/Concelier/` | Vulnerability advisory ingestion and merge engine |
| CLI | `src/Cli/` | Command-line interface for scanner distribution and job control |
| Scanner | `src/Scanner/` | Container scanning with SBOM generation |
| Authority | `src/Authority/` | Authentication and authorization |
| Signer | `src/Signer/` | Cryptographic signing operations |
| Attestor | `src/Attestor/` | in-toto/DSSE attestation generation |
| Excititor | `src/Excititor/` | VEX document ingestion and export |
| Policy | `src/Policy/` | OPA/Rego policy engine |
| Scheduler | `src/Scheduler/` | Job scheduling and queue management |
| Notify | `src/Notify/` | Notification delivery (Email, Slack, Teams) |
| Zastava | `src/Zastava/` | Container registry webhook observer |
### Code Organization Patterns
- **Libraries:** `src/<Module>/__Libraries/StellaOps.<Module>.*`
- **Tests:** `src/<Module>/__Tests/StellaOps.<Module>.*.Tests/`
- **Plugins:** Follow naming `StellaOps.<Module>.Connector.*` or `StellaOps.<Module>.Plugin.*`
- **Shared test infrastructure:** `StellaOps.Concelier.Testing` provides MongoDB fixtures
### Naming Conventions
- All modules are .NET 10 projects, except the UI (Angular)
- Module projects: `StellaOps.<ModuleName>`
- Libraries/plugins common to multiple modules: `StellaOps.<LibraryOrPlugin>`
- Each project lives in its own folder
### Key Glossary
- **OVAL** — Vendor/distro security definition format; authoritative for OS packages
- **NEVRA / EVR** — RPM and Debian version semantics for OS packages
- **PURL / SemVer** — Coordinates and version semantics for OSS ecosystems
- **KEV** — Known Exploited Vulnerabilities (flag only)
## Coding Rules
### Core Principles
1. **Determinism:** Outputs must be reproducible - stable ordering, UTC ISO-8601 timestamps, immutable NDJSON where applicable
2. **Offline-first:** Remote host allowlist, strict schema validation, avoid hard-coded external dependencies unless explicitly allowed
3. **Plugin architecture:** Concelier connectors, Authority plugins, Scanner analyzers are all plugin-based
4. **VEX-first decisioning:** Exploitability modeled in OpenVEX with lattice logic for stable outcomes
### Implementation Guidelines
- Follow .NET 10 and Angular v17 best practices
- Maximise reuse and composability
- Never regress determinism, ordering, or precedence
- Every change must be accompanied by or covered by tests
- Gated LLM usage (only where explicitly configured)
### Test Layout
- Module tests: `StellaOps.<Module>.<Component>.Tests`
- Shared fixtures/harnesses: `StellaOps.<Module>.Testing`
- Tests use xUnit, Mongo2Go for MongoDB integration tests
### Documentation Updates
When scope, contracts, or workflows change, update the relevant docs under:
- `docs/modules/**` - Module architecture dossiers
- `docs/api/` - API documentation
- `docs/risk/` - Risk documentation
- `docs/airgap/` - Air-gap operation docs
## Role-Based Behavior
When working in this repository, behavior changes based on the role specified:
### As Implementer (Default for coding tasks)
- Work only inside the module's directory defined by the sprint's "Working directory"
- Cross-module edits require explicit notes in commit/PR descriptions
- Do **not** ask clarification questions - if ambiguity exists:
- Mark the task as `BLOCKED` in the sprint `Delivery Tracker`
- Add a note in `Decisions & Risks` describing the issue
- Skip to the next unblocked task
- Maintain status tracking: `TODO → DOING → DONE/BLOCKED` in sprint files
- Read the module's `AGENTS.md` before coding in that module
### As Project Manager
- Sprint files follow format: `SPRINT_<IMPLID>_<BATCHID>_<SPRINTID>_<topic>.md`
- IMPLID epochs: `1000` basic libraries, `2000` ingestion, `3000` backend services, `4000` CLI/UI, `5000` docs, `6000` marketing
- Normalize sprint files to standard template while preserving content
- Ensure module `AGENTS.md` files exist and are up to date
### As Product Manager
- Review advisories in `docs/product-advisories/`
- Check for overlaps with `docs/product-advisories/archived/`
- Validate against module docs and existing implementations
- Hand over to project manager role for sprint/task definition
## Task Workflow
### Status Discipline
Always update task status in `docs/implplan/SPRINT_*.md`:
- `TODO` - Not started
- `DOING` - In progress
- `DONE` - Completed
- `BLOCKED` - Waiting on decision/clarification
### Prerequisites
Before coding, confirm required docs are read:
- `docs/README.md`
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- `docs/modules/platform/architecture-overview.md`
- Relevant module dossier (e.g., `docs/modules/<module>/architecture.md`)
- Module-specific `AGENTS.md` file
### Git Rules
- Never use `git reset` unless explicitly told to do so
- Never skip hooks (--no-verify, --no-gpg-sign) unless explicitly requested
## Configuration
- **Sample configs:** `etc/concelier.yaml.sample`, `etc/authority.yaml.sample`
- **Plugin manifests:** `etc/authority.plugins/*.yaml`
- **NuGet sources:** Curated packages in `local-nugets/`, public sources configured in `Directory.Build.props`
## Documentation
- **Architecture overview:** `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
- **Module dossiers:** `docs/modules/<module>/architecture.md`
- **API/CLI reference:** `docs/09_API_CLI_REFERENCE.md`
- **Offline operation:** `docs/24_OFFLINE_KIT.md`
- **Quickstart:** `docs/10_CONCELIER_CLI_QUICKSTART.md`
- **Sprint planning:** `docs/implplan/SPRINT_*.md`
## CI/CD
Workflows are in `.gitea/workflows/`. Key workflows:
- `build-test-deploy.yml` - Main build, test, and deployment pipeline
- `cli-build.yml` - CLI multi-platform builds
- `scanner-determinism.yml` - Scanner output reproducibility tests
- `policy-lint.yml` - Policy validation
## Environment Variables
- `STELLAOPS_BACKEND_URL` - Backend API URL for CLI
- `STELLAOPS_TEST_MONGO_URI` - MongoDB connection string for integration tests
- `StellaOpsEnableCryptoPro` - Enable GOST crypto support (set to `true` in build)

View File

@@ -21,22 +21,33 @@
| 1 | POLICY-ENGINE-80-002 | TODO | Depends on 80-001. | Policy · Storage Guild / `src/Policy/StellaOps.Policy.Engine` | Join reachability facts + Redis caches. |
| 2 | POLICY-ENGINE-80-003 | TODO | Depends on 80-002. | Policy · Policy Editor Guild / `src/Policy/StellaOps.Policy.Engine` | SPL predicates/actions reference reachability. |
| 3 | POLICY-ENGINE-80-004 | TODO | Depends on 80-003. | Policy · Observability Guild / `src/Policy/StellaOps.Policy.Engine` | Metrics/traces for signals usage. |
| 4 | POLICY-OBS-50-001 | TODO | — | Policy · Observability Guild / `src/Policy/StellaOps.Policy.Engine` | Telemetry core for API/worker hosts. |
| 5 | POLICY-OBS-51-001 | TODO | Depends on 50-001. | Policy · DevOps Guild / `src/Policy/StellaOps.Policy.Engine` | Golden-signal metrics + SLOs. |
| 6 | POLICY-OBS-52-001 | TODO | Depends on 51-001. | Policy Guild / `src/Policy/StellaOps.Policy.Engine` | Timeline events for evaluate/decision flows. |
| 7 | POLICY-OBS-53-001 | TODO | Depends on 52-001. | Policy · Evidence Locker Guild / `src/Policy/StellaOps.Policy.Engine` | Evaluation evidence bundles + manifests. |
| 8 | POLICY-OBS-54-001 | TODO | Depends on 53-001. | Policy · Provenance Guild / `src/Policy/StellaOps.Policy.Engine` | DSSE attestations for evaluations. |
| 9 | POLICY-OBS-55-001 | TODO | Depends on 54-001. | Policy · DevOps Guild / `src/Policy/StellaOps.Policy.Engine` | Incident mode sampling overrides. |
| 4 | POLICY-OBS-50-001 | DONE (2025-11-27) | — | Policy · Observability Guild / `src/Policy/StellaOps.Policy.Engine` | Telemetry core for API/worker hosts. |
| 5 | POLICY-OBS-51-001 | DONE (2025-11-27) | Depends on 50-001. | Policy · DevOps Guild / `src/Policy/StellaOps.Policy.Engine` | Golden-signal metrics + SLOs. |
| 6 | POLICY-OBS-52-001 | DONE (2025-11-27) | Depends on 51-001. | Policy Guild / `src/Policy/StellaOps.Policy.Engine` | Timeline events for evaluate/decision flows. |
| 7 | POLICY-OBS-53-001 | DONE (2025-11-27) | Depends on 52-001. | Policy · Evidence Locker Guild / `src/Policy/StellaOps.Policy.Engine` | Evaluation evidence bundles + manifests. |
| 8 | POLICY-OBS-54-001 | DONE (2025-11-27) | Depends on 53-001. | Policy · Provenance Guild / `src/Policy/StellaOps.Policy.Engine` | DSSE attestations for evaluations. |
| 9 | POLICY-OBS-55-001 | DONE (2025-11-27) | Depends on 54-001. | Policy · DevOps Guild / `src/Policy/StellaOps.Policy.Engine` | Incident mode sampling overrides. |
| 10 | POLICY-RISK-66-001 | DONE (2025-11-22) | PREP-POLICY-RISK-66-001-RISKPROFILE-LIBRARY-S | Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.RiskProfile` | RiskProfile JSON schema + validator stubs. |
| 11 | POLICY-RISK-66-002 | TODO | Depends on 66-001. | Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Inheritance/merge + deterministic hashing. |
| 12 | POLICY-RISK-66-003 | TODO | Depends on 66-002. | Policy · Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.Engine` | Integrate RiskProfile into Policy Engine config. |
| 13 | POLICY-RISK-66-004 | TODO | Depends on 66-003. | Policy · Risk Profile Schema Guild / `src/Policy/__Libraries/StellaOps.Policy` | Load/save RiskProfiles; validation diagnostics. |
| 14 | POLICY-RISK-67-001 | TODO | Depends on 66-004. | Policy · Risk Engine Guild / `src/Policy/StellaOps.Policy.Engine` | Trigger scoring jobs on new/updated findings. |
| 15 | POLICY-RISK-67-001 | TODO | Depends on 67-001. | Risk Profile Schema Guild · Policy Engine Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Profile storage/versioning lifecycle. |
| 11 | POLICY-RISK-66-002 | DONE (2025-11-27) | Depends on 66-001. | Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Inheritance/merge + deterministic hashing. |
| 12 | POLICY-RISK-66-003 | DONE (2025-11-27) | Depends on 66-002. | Policy · Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.Engine` | Integrate RiskProfile into Policy Engine config. |
| 13 | POLICY-RISK-66-004 | DONE (2025-11-27) | Depends on 66-003. | Policy · Risk Profile Schema Guild / `src/Policy/__Libraries/StellaOps.Policy` | Load/save RiskProfiles; validation diagnostics. |
| 14 | POLICY-RISK-67-001 | DONE (2025-11-27) | Depends on 66-004. | Policy · Risk Engine Guild / `src/Policy/StellaOps.Policy.Engine` | Trigger scoring jobs on new/updated findings. |
| 15 | POLICY-RISK-67-001 | DONE (2025-11-27) | Depends on 67-001. | Risk Profile Schema Guild · Policy Engine Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Profile storage/versioning lifecycle. |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-27 | `POLICY-RISK-67-001` (task 15): Created `Lifecycle/RiskProfileLifecycle.cs` with lifecycle models (RiskProfileLifecycleStatus enum: Draft/Active/Deprecated/Archived, RiskProfileVersionInfo, RiskProfileLifecycleEvent, RiskProfileVersionComparison, RiskProfileChange). Created `RiskProfileLifecycleService` with status transitions (CreateVersion, Activate, Deprecate, Archive, Restore), version management, event recording, and version comparison (detecting breaking changes in signals/inheritance). | Implementer |
| 2025-11-27 | `POLICY-RISK-67-001`: Created `Scoring/RiskScoringModels.cs` with FindingChangedEvent, RiskScoringJobRequest, RiskScoringJob, RiskScoringResult models and enums. Created `IRiskScoringJobStore` interface and `InMemoryRiskScoringJobStore` for job persistence. Created `RiskScoringTriggerService` handling FindingChangedEvent triggers with deduplication, batch processing, priority calculation, and job creation. Added risk scoring metrics to PolicyEngineTelemetry (jobs_created, triggers_skipped, duration, findings_scored). Registered services in Program.cs DI. | Implementer |
| 2025-11-27 | `POLICY-RISK-66-004`: Added RiskProfile project reference to StellaOps.Policy library. Created `IRiskProfileRepository` interface with GetAsync, GetVersionAsync, GetLatestAsync, ListProfileIdsAsync, ListVersionsAsync, SaveAsync, DeleteVersionAsync, DeleteAllVersionsAsync, ExistsAsync. Created `InMemoryRiskProfileRepository` for testing/development. Created `RiskProfileDiagnostics` with comprehensive validation (RISK001-RISK050 error codes) covering structure, signals, weights, overrides, and inheritance. Includes `RiskProfileDiagnosticsReport` and `RiskProfileIssue` types. | Implementer |
| 2025-11-27 | `POLICY-RISK-66-003`: Added RiskProfile project reference to Policy Engine. Created `PolicyEngineRiskProfileOptions` with config for enabled, defaultProfileId, profileDirectory, maxInheritanceDepth, validateOnLoad, cacheResolvedProfiles, and inline profile definitions. Created `RiskProfileConfigurationService` for loading profiles from config/files, resolving inheritance, and providing profiles to engine. Updated `PolicyEngineBootstrapWorker` to load profiles at startup. Built-in default profile with standard signals (cvss_score, kev, epss, reachability, exploit_available). | Implementer |
| 2025-11-27 | `POLICY-RISK-66-002`: Created `Models/RiskProfileModel.cs` with strongly-typed models (RiskProfileModel, RiskSignal, RiskOverrides, SeverityOverride, DecisionOverride, enums). Created `Merge/RiskProfileMergeService.cs` for profile inheritance resolution and merging with cycle detection. Created `Hashing/RiskProfileHasher.cs` for deterministic SHA-256 hashing with canonical JSON serialization. | Implementer |
| 2025-11-27 | `POLICY-OBS-55-001`: Created `IncidentMode.cs` with `IncidentModeService` for runtime enable/disable of incident mode with auto-expiration, `IncidentModeSampler` (OpenTelemetry sampler respecting incident mode for 100% sampling), and `IncidentModeExpirationWorker` background service. Added `IncidentMode` option to telemetry config. Registered in Program.cs DI. | Implementer |
| 2025-11-27 | `POLICY-OBS-54-001`: Created `PolicyEvaluationAttestation.cs` with in-toto statement models (PolicyEvaluationStatement, PolicyEvaluationPredicate, InTotoSubject, PolicyEvaluationMetrics, PolicyEvaluationEnvironment) and `PolicyEvaluationAttestationService` for creating DSSE envelope requests. Added Attestor.Envelope project reference. Registered in Program.cs DI. | Implementer |
| 2025-11-27 | `POLICY-OBS-53-001`: Created `EvidenceBundle.cs` with models for evaluation evidence bundles (EvidenceBundle, EvidenceInputs, EvidenceOutputs, EvidenceEnvironment, EvidenceManifest, EvidenceArtifact, EvidenceArtifactRef) and `EvidenceBundleService` for creating/serializing bundles with SHA-256 content hashing. Registered in Program.cs DI. | Implementer |
| 2025-11-27 | `POLICY-OBS-52-001`: Created `PolicyTimelineEvents.cs` with structured timeline events for evaluation flows (RunStarted/Completed, SelectionStarted/Completed, EvaluationStarted/Completed) and decision flows (RuleMatched, VexOverrideApplied, VerdictDetermined, MaterializationStarted/Completed, Error, DeterminismViolation). Events include trace correlation and structured data. Registered in Program.cs DI. | Implementer |
| 2025-11-27 | `POLICY-OBS-51-001`: Added golden-signal metrics (Latency: `policy_api_latency_seconds`, `policy_evaluation_latency_seconds`; Traffic: `policy_requests_total`, `policy_evaluations_total`, `policy_findings_materialized_total`; Errors: `policy_errors_total`, `policy_api_errors_total`, `policy_evaluation_failures_total`; Saturation: `policy_concurrent_evaluations`, `policy_worker_utilization`) and SLO metrics (`policy_slo_burn_rate`, `policy_error_budget_remaining`, `policy_slo_violations_total`). | Implementer |
| 2025-11-27 | `POLICY-OBS-50-001`: Implemented telemetry core for Policy Engine. Added `PolicyEngineTelemetry.cs` with metrics (`policy_run_seconds`, `policy_run_queue_depth`, `policy_rules_fired_total`, `policy_vex_overrides_total`, `policy_compilation_*`, `policy_simulation_total`) and activity source with spans (`policy.select`, `policy.evaluate`, `policy.materialize`, `policy.simulate`, `policy.compile`). Created `TelemetryExtensions.cs` with OpenTelemetry + Serilog configuration. Wired into `Program.cs`. | Implementer |
| 2025-11-20 | Published risk profile library prep (docs/modules/policy/prep/2025-11-20-riskprofile-66-001-prep.md); set PREP-POLICY-RISK-66-001 to DOING. | Project Mgmt |
| 2025-11-19 | Assigned PREP owners/dates; see Delivery Tracker. | Planning |
| 2025-11-08 | Sprint stub; awaiting upstream phases. | Planning |

View File

@@ -17,8 +17,8 @@
## Delivery Tracker
| # | Task ID & handle | State | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| 1 | POLICY-RISK-67-002 | BLOCKED (2025-11-26) | Await risk profile contract + schema (67-001) and API shape. | Policy Guild / `src/Policy/StellaOps.Policy.Engine` | Risk profile lifecycle APIs. |
| 2 | POLICY-RISK-67-002 | BLOCKED (2025-11-26) | Depends on 67-001/67-002 spec; schema draft absent. | Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Publish `.well-known/risk-profile-schema` + CLI validation. |
| 1 | POLICY-RISK-67-002 | DONE (2025-11-27) | — | Policy Guild / `src/Policy/StellaOps.Policy.Engine` | Risk profile lifecycle APIs. |
| 2 | POLICY-RISK-67-002 | DONE (2025-11-27) | | Risk Profile Schema Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Publish `.well-known/risk-profile-schema` + CLI validation. |
| 3 | POLICY-RISK-67-003 | BLOCKED (2025-11-26) | Blocked by 67-002 contract + simulation inputs. | Policy · Risk Engine Guild / `src/Policy/__Libraries/StellaOps.Policy` | Risk simulations + breakdowns. |
| 4 | POLICY-RISK-68-001 | BLOCKED (2025-11-26) | Blocked by 67-003 outputs and missing Policy Studio contract. | Policy · Policy Studio Guild / `src/Policy/StellaOps.Policy.Engine` | Simulation API for Policy Studio. |
| 5 | POLICY-RISK-68-001 | BLOCKED (2025-11-26) | Blocked until 68-001 API + Authority attachment rules defined. | Risk Profile Schema Guild · Authority Guild / `src/Policy/StellaOps.Policy.RiskProfile` | Scope selectors, precedence rules, Authority attachment. |
@@ -31,11 +31,13 @@
| 12 | POLICY-SPL-23-003 | DONE (2025-11-26) | Layering/override engine shipped; next step is explanation tree. | Policy Guild / `src/Policy/__Libraries/StellaOps.Policy` | Layering/override engine + tests. |
| 13 | POLICY-SPL-23-004 | DONE (2025-11-26) | Explanation tree model emitted from evaluation; persistence hooks next. | Policy · Audit Guild / `src/Policy/__Libraries/StellaOps.Policy` | Explanation tree model + persistence. |
| 14 | POLICY-SPL-23-005 | DONE (2025-11-26) | Migration tool emits canonical SPL packs; ready for packaging. | Policy · DevEx Guild / `src/Policy/__Libraries/StellaOps.Policy` | Migration tool to baseline SPL packs. |
| 15 | POLICY-SPL-24-001 | TODO | Depends on 23-005. | Policy · Signals Guild / `src/Policy/__Libraries/StellaOps.Policy` | Extend SPL with reachability/exploitability predicates. |
| 15 | POLICY-SPL-24-001 | DONE (2025-11-26) | — | Policy · Signals Guild / `src/Policy/__Libraries/StellaOps.Policy` | Extend SPL with reachability/exploitability predicates. |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-27 | `POLICY-RISK-67-002` (task 2): Added `RiskProfileSchemaEndpoints.cs` with `/.well-known/risk-profile-schema` endpoint (anonymous, ETag/Cache-Control, schema v1) and `/api/risk/schema/validate` POST endpoint for profile validation. Extended `RiskProfileSchemaProvider` with GetSchemaText(), GetSchemaVersion(), and GetETag() methods. Added `risk-profile` CLI command group with `validate` (--input, --format, --output, --strict) and `schema` (--output) subcommands. Added RiskProfile project reference to CLI. | Implementer |
| 2025-11-27 | `POLICY-RISK-67-002` (task 1): Created `Endpoints/RiskProfileEndpoints.cs` with REST APIs for profile lifecycle management: ListProfiles, GetProfile, ListVersions, GetVersion, CreateProfile (draft), ActivateProfile, DeprecateProfile, ArchiveProfile, GetProfileEvents, CompareProfiles, GetProfileHash. Uses `RiskProfileLifecycleService` for status transitions and `RiskProfileConfigurationService` for profile storage/hashing. Authorization via StellaOpsScopes (PolicyRead/PolicyEdit/PolicyActivate). Registered `RiskProfileLifecycleService` in DI and wired up `MapRiskProfiles()` in Program.cs. | Implementer |
| 2025-11-25 | Delivered SPL v1 schema + sample fixtures (spl-schema@1.json, spl-sample@1.json, SplSchemaResource) and embedded in `StellaOps.Policy`; marked POLICY-SPL-23-001 DONE. | Implementer |
| 2025-11-26 | Implemented SPL canonicalizer + SHA-256 digest (order-stable statements/actions/conditions) with unit tests; marked POLICY-SPL-23-002 DONE. | Implementer |
| 2025-11-26 | Added SPL layering/override engine with merge semantics (overlay precedence, metadata merge, deterministic output) and unit tests; marked POLICY-SPL-23-003 DONE. | Implementer |

View File

@@ -35,7 +35,7 @@
| 3 | CLI-REPLAY-187-002 | BLOCKED | PREP-CLI-REPLAY-187-002-WAITING-ON-EVIDENCELO | CLI Guild | Add CLI `scan --record`, `verify`, `replay`, `diff` with offline bundle resolution; align golden tests. |
| 4 | RUNBOOK-REPLAY-187-004 | BLOCKED | PREP-RUNBOOK-REPLAY-187-004-DEPENDS-ON-RETENT | Docs Guild · Ops Guild | Publish `/docs/runbooks/replay_ops.md` coverage for retention enforcement, RootPack rotation, verification drills. |
| 5 | CRYPTO-REGISTRY-DECISION-161 | DONE | Decision recorded in `docs/security/crypto-registry-decision-2025-11-18.md`; publish contract defaults. | Security Guild · Evidence Locker Guild | Capture decision from 2025-11-18 review; emit changelog + reference implementation for downstream parity. |
| 6 | EVID-CRYPTO-90-001 | TODO | Apply registry defaults and wire `ICryptoProviderRegistry` into EvidenceLocker paths. | Evidence Locker Guild · Security Guild | Route hashing/signing/bundle encryption through `ICryptoProviderRegistry`/`ICryptoHash` for sovereign crypto providers. |
| 6 | EVID-CRYPTO-90-001 | DONE | Implemented; `MerkleTreeCalculator` now uses `ICryptoProviderRegistry` for sovereign crypto routing. | Evidence Locker Guild · Security Guild | Route hashing/signing/bundle encryption through `ICryptoProviderRegistry`/`ICryptoHash` for sovereign crypto providers. |
## Action Tracker
| Action | Owner(s) | Due | Status |
@@ -84,3 +84,4 @@
| 2025-11-18 | Started EVID-OBS-54-002 with shared schema; replay/CLI remain pending ledger shape. | Implementer |
| 2025-11-20 | Completed PREP-EVID-REPLAY-187-001, PREP-CLI-REPLAY-187-002, and PREP-RUNBOOK-REPLAY-187-004; published prep docs at `docs/modules/evidence-locker/replay-payload-contract.md`, `docs/modules/cli/guides/replay-cli-prep.md`, and `docs/runbooks/replay_ops_prep_187_004.md`. | Implementer |
| 2025-11-20 | Added schema readiness and replay delivery prep notes for Evidence Locker Guild; see `docs/modules/evidence-locker/prep/2025-11-20-schema-readiness-blockers.md` and `.../2025-11-20-replay-delivery-sync.md`. Marked PREP-EVIDENCE-LOCKER-GUILD-BLOCKED-SCHEMAS-NO and PREP-EVIDENCE-LOCKER-GUILD-REPLAY-DELIVERY-GU DONE. | Implementer |
| 2025-11-27 | Completed EVID-CRYPTO-90-001: Extended `ICryptoProviderRegistry` with `ContentHashing` capability and `ResolveHasher` method; created `ICryptoHasher` interface with `DefaultCryptoHasher` implementation; wired `MerkleTreeCalculator` to use crypto registry for sovereign crypto routing; added `EvidenceCryptoOptions` for algorithm/provider configuration. | Implementer |

View File

@@ -20,23 +20,38 @@
| --- | --- | --- | --- | --- | --- |
| 1 | NOTIFY-SVC-37-001 | DONE (2025-11-24) | Contract published at `docs/api/notify-openapi.yaml` and `src/Notifier/StellaOps.Notifier/StellaOps.Notifier.WebService/openapi/notify-openapi.yaml`. | Notifications Service Guild (`src/Notifier/StellaOps.Notifier`) | Define pack approval & policy notification contract (OpenAPI schema, event payloads, resume tokens, security guidance). |
| 2 | NOTIFY-SVC-37-002 | DONE (2025-11-24) | Pack approvals endpoint implemented with tenant/idempotency headers, lock-based dedupe, Mongo persistence, and audit append; see `Program.cs` + storage migrations. | Notifications Service Guild | Implement secure ingestion endpoint, Mongo persistence (`pack_approvals`), idempotent writes, audit trail. |
| 3 | NOTIFY-SVC-37-003 | DOING (2025-11-24) | Pack approval templates + default channels/rule seeded via hosted seeder; validation tests added (`PackApprovalTemplateTests`, `PackApprovalTemplateSeederTests`). Next: hook dispatch/rendering. | Notifications Service Guild | Approval/policy templates, routing predicates, channel dispatch (email/webhook), localization + redaction. |
| 3 | NOTIFY-SVC-37-003 | DONE (2025-11-27) | Dispatch/rendering layer complete: `INotifyTemplateRenderer`/`SimpleTemplateRenderer` (Handlebars-style {{variable}} + {{#each}}, sensitive key redaction), `INotifyChannelDispatcher`/`WebhookChannelDispatcher` (Slack/webhook with retry), `DeliveryDispatchWorker` (BackgroundService), DI wiring in Program.cs, options + tests. | Notifications Service Guild | Approval/policy templates, routing predicates, channel dispatch (email/webhook), localization + redaction. |
| 4 | NOTIFY-SVC-37-004 | DONE (2025-11-24) | Test harness stabilized with in-memory stores; OpenAPI stub returns scope/etag; pack-approvals ack path exercised. | Notifications Service Guild | Acknowledgement API, Task Runner callback client, metrics for outstanding approvals, runbook updates. |
| 5 | NOTIFY-SVC-38-002 | TODO | Depends on 37-004. | Notifications Service Guild | Channel adapters (email, chat webhook, generic webhook) with retry policies, health checks, audit logging. |
| 6 | NOTIFY-SVC-38-003 | TODO | Depends on 38-002. | Notifications Service Guild | Template service (versioned templates, localization scaffolding) and renderer (redaction allowlists, Markdown/HTML/JSON, provenance links). |
| 7 | NOTIFY-SVC-38-004 | TODO | Depends on 38-003. | Notifications Service Guild | REST + WS APIs (rules CRUD, templates preview, incidents list, ack) with audit logging, RBAC, live feed stream. |
| 8 | NOTIFY-SVC-39-001 | TODO | Depends on 38-004. | Notifications Service Guild | Correlation engine with pluggable key expressions/windows, throttler, quiet hours/maintenance evaluator, incident lifecycle. |
| 9 | NOTIFY-SVC-39-002 | TODO | Depends on 39-001. | Notifications Service Guild | Digest generator (queries, formatting) with schedule runner and distribution. |
| 10 | NOTIFY-SVC-39-003 | TODO | Depends on 39-002. | Notifications Service Guild | Simulation engine/API to dry-run rules against historical events, returning matched actions with explanations. |
| 11 | NOTIFY-SVC-39-004 | TODO | Depends on 39-003. | Notifications Service Guild | Quiet hour calendars + default throttles with audit logging and operator overrides. |
| 12 | NOTIFY-SVC-40-001 | TODO | Depends on 39-004. | Notifications Service Guild | Escalations + on-call schedules, ack bridge, PagerDuty/OpsGenie adapters, CLI/in-app inbox channels. |
| 13 | NOTIFY-SVC-40-002 | TODO | Depends on 40-001. | Notifications Service Guild | Summary storm breaker notifications, localization bundles, fallback handling. |
| 14 | NOTIFY-SVC-40-003 | TODO | Depends on 40-002. | Notifications Service Guild | Security hardening: signed ack links (KMS), webhook HMAC/IP allowlists, tenant isolation fuzz tests, HTML sanitization. |
| 15 | NOTIFY-SVC-40-004 | TODO | Depends on 40-003. | Notifications Service Guild | Observability (metrics/traces for escalations/latency), dead-letter handling, chaos tests for channel outages, retention policies. |
| 5 | NOTIFY-SVC-38-002 | DONE (2025-11-27) | Channel adapters complete: `IChannelAdapter`, `WebhookChannelAdapter`, `EmailChannelAdapter`, `ChatWebhookChannelAdapter` with retry policies (exponential backoff + jitter), health checks, audit logging, HMAC signing, `ChannelAdapterFactory` DI registration. Tests at `StellaOps.Notifier.Tests/Channels/`. | Notifications Service Guild | Channel adapters (email, chat webhook, generic webhook) with retry policies, health checks, audit logging. |
| 6 | NOTIFY-SVC-38-003 | DONE (2025-11-27) | Template service complete: `INotifyTemplateService`/`NotifyTemplateService` (locale fallback chain, versioning, CRUD with audit), `EnhancedTemplateRenderer` (configurable redaction allowlists/denylists, Markdown/HTML/JSON/PlainText format conversion, provenance links, {{#if}} conditionals, format specifiers), `TemplateRendererOptions`, DI registration via `AddTemplateServices()`. Tests at `StellaOps.Notifier.Tests/Templates/`. | Notifications Service Guild | Template service (versioned templates, localization scaffolding) and renderer (redaction allowlists, Markdown/HTML/JSON, provenance links). |
| 7 | NOTIFY-SVC-38-004 | DONE (2025-11-27) | REST APIs complete: `/api/v2/notify/rules` (CRUD), `/api/v2/notify/templates` (CRUD + preview + validate), `/api/v2/notify/incidents` (list + ack + resolve). Contract DTOs at `Contracts/RuleContracts.cs`, `TemplateContracts.cs`, `IncidentContracts.cs`. Endpoints via `MapNotifyApiV2()` extension. Audit logging on all mutations. Tests at `StellaOps.Notifier.Tests/Endpoints/`. | Notifications Service Guild | REST + WS APIs (rules CRUD, templates preview, incidents list, ack) with audit logging, RBAC, live feed stream. |
| 8 | NOTIFY-SVC-39-001 | DONE (2025-11-27) | Correlation engine complete: `ICorrelationEngine`/`CorrelationEngine` (orchestrates key building, incident management, throttling, quiet hours), `ICorrelationKeyBuilder` interface with `CompositeCorrelationKeyBuilder` (tenant+kind+payload fields), `TemplateCorrelationKeyBuilder` (template expressions), `CorrelationKeyBuilderFactory`. `INotifyThrottler`/`InMemoryNotifyThrottler` (sliding window throttling). `IQuietHoursEvaluator`/`QuietHoursEvaluator` (quiet hours schedules, maintenance windows). `IIncidentManager`/`InMemoryIncidentManager` (incident lifecycle: open/acknowledged/resolved). Notification policies (FirstOnly, EveryEvent, OnEscalation, Periodic). DI registration via `AddCorrelationServices()`. Comprehensive tests at `StellaOps.Notifier.Tests/Correlation/`. | Notifications Service Guild | Correlation engine with pluggable key expressions/windows, throttler, quiet hours/maintenance evaluator, incident lifecycle. |
| 9 | NOTIFY-SVC-39-002 | DONE (2025-11-27) | Digest generator complete: `IDigestGenerator`/`DigestGenerator` (queries incidents, calculates summary statistics, builds timeline, renders to Markdown/HTML/PlainText/JSON), `IDigestScheduler`/`InMemoryDigestScheduler` (cron-based scheduling with Cronos, timezone support, next-run calculation), `DigestScheduleRunner` BackgroundService (concurrent schedule execution with semaphore limiting), `IDigestDistributor`/`DigestDistributor` (webhook/Slack/Teams/email distribution with format-specific payloads). DTOs: `DigestQuery`, `DigestContent`, `DigestSummary`, `DigestIncident`, `EventKindSummary`, `TimelineEntry`, `DigestSchedule`, `DigestRecipient`. DI registration via `AddDigestServices()` with `DigestServiceBuilder`. Tests at `StellaOps.Notifier.Tests/Digest/`. | Notifications Service Guild | Digest generator (queries, formatting) with schedule runner and distribution. |
| 10 | NOTIFY-SVC-39-003 | DONE (2025-11-27) | Simulation engine complete: `ISimulationEngine`/`SimulationEngine` (dry-runs rules against events without side effects, evaluates all rules against all events, builds detailed match/non-match explanations), `SimulationRequest`/`SimulationResult` DTOs with `SimulationEventResult`, `SimulationRuleMatch`, `SimulationActionMatch`, `SimulationRuleNonMatch`, `SimulationRuleSummary`. Rule validation via `ValidateRuleAsync` with error/warning detection (missing fields, broad matches, unknown severities, disabled actions). API endpoint at `/api/v2/simulate` (POST for simulation, POST /validate for rule validation) via `SimulationEndpoints.cs`. DI registration via `AddSimulationServices()`. Tests at `StellaOps.Notifier.Tests/Simulation/SimulationEngineTests.cs`. | Notifications Service Guild | Simulation engine/API to dry-run rules against historical events, returning matched actions with explanations. |
| 11 | NOTIFY-SVC-39-004 | DONE (2025-11-27) | Quiet hour calendars, throttle configs, audit logging, and operator overrides implemented. | Notifications Service Guild | Quiet hour calendars + default throttles with audit logging and operator overrides. |
| 12 | NOTIFY-SVC-40-001 | DONE (2025-11-27) | Escalation and on-call systems complete. | Notifications Service Guild | Escalations + on-call schedules, ack bridge, PagerDuty/OpsGenie adapters, CLI/in-app inbox channels. |
| 13 | NOTIFY-SVC-40-002 | DONE (2025-11-27) | Storm breaker, localization, and fallback services complete. | Notifications Service Guild | Summary storm breaker notifications, localization bundles, fallback handling. |
| 14 | NOTIFY-SVC-40-003 | DONE (2025-11-27) | Security services complete. | Notifications Service Guild | Security hardening: signed ack links (KMS), webhook HMAC/IP allowlists, tenant isolation fuzz tests, HTML sanitization. |
| 15 | NOTIFY-SVC-40-004 | DONE (2025-11-27) | Observability stack complete. | Notifications Service Guild | Observability (metrics/traces for escalations/latency), dead-letter handling, chaos tests for channel outages, retention policies. |
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-27 | Completed observability and chaos tests (NOTIFY-SVC-40-004): Implemented comprehensive observability stack for the Notifier module. **Metrics Service** (`INotifierMetrics`/`DefaultNotifierMetrics`): Uses System.Diagnostics.Metrics API with counters for delivery attempts, escalations, storm events, fallbacks, dead-letters; histograms for delivery latency, acknowledgment latency; observable gauges for active escalations/storms/pending deliveries. `NotifierMetricsSnapshot` provides point-in-time metrics with tenant filtering. Configuration via `NotifierMetricsOptions` (Enabled, MeterName, SamplingInterval, HistogramBuckets). **Tracing Service** (`INotifierTracing`/`DefaultNotifierTracing`): Uses System.Diagnostics.Activity API (OpenTelemetry compatible) for distributed tracing. Span types: delivery, escalation, digest, template render, correlation, webhook validation. Helper methods: `AddEvent()`, `SetError()`, `SetOk()`, `AddTags()`, `StartLinkedSpan()`. Extension methods for recording delivery results, escalation levels, storm detection, fallbacks, template renders, correlation results. Configuration via `NotifierTracingOptions` (Enabled, SourceName, IncludeSensitiveData, SamplingRatio, MaxAttributesPerSpan, MaxEventsPerSpan). **Dead Letter Handler** (`IDeadLetterHandler`/`InMemoryDeadLetterHandler`): Queue for failed notifications with entry lifecycle (Pending→PendingRetry→Retried, Discarded). Operations: `DeadLetterAsync()`, `RetryAsync()` (with retry limits), `DiscardAsync()`, `GetEntriesAsync()` (with status/channel filtering, pagination), `GetStatisticsAsync()` (totals, breakdown by channel/reason), `PurgeAsync()` (cleanup old entries). Observer pattern via `Subscribe()`/`IDeadLetterObserver` for real-time notifications. Configuration via `DeadLetterOptions` (Enabled, MaxRetries, RetryDelay, MaxEntriesPerTenant). **Chaos Test Runner** (`IChaosTestRunner`/`InMemoryChaosTestRunner`): Fault injection framework for resilience testing. Fault types: Outage (complete failure), PartialFailure (percentage-based), Latency (delay injection), Intermittent (random failures), RateLimit (throttling), Timeout, ErrorResponse (specific HTTP codes), CorruptResponse. Experiment lifecycle: create, start, stop, cleanup. `ShouldFailAsync()` checks active experiments and returns `ChaosDecision` with fault details. Outcome recording and statistics. Configuration via `ChaosTestOptions` (Enabled, MaxConcurrentExperiments, MaxExperimentDuration, RequireTenantTarget, AllowedInitiators). **Retention Policy Service** (`IRetentionPolicyService`/`InMemoryRetentionPolicyService`): Data cleanup policies for delivery logs, escalations, storm events, dead letters, audit logs, metrics, traces, chaos experiments, isolation violations, webhook logs, template cache. Actions: Delete, Archive, Compress, FlagForReview. Features: cron-based scheduling, tenant scoping, execution history, preview before execute, pluggable `IRetentionHandler` per data type, filters by channel/status/severity/tags. Configuration via `RetentionPolicyOptions` (Enabled, DefaultRetentionPeriod, Min/MaxRetentionPeriod, DefaultBatchSize, ExecutionHistoryRetention, DefaultPeriods per data type). **REST APIs** via `ObservabilityEndpoints.cs`: `/api/v1/observability/metrics` (GET snapshot), `/metrics/{tenantId}` (tenant-specific), `/dead-letters/{tenantId}` (list/get/retry/discard/stats/purge), `/chaos/experiments` (list/get/start/stop/results), `/retention/policies` (CRUD/execute/preview/history). **DI Registration** via `AddNotifierObservabilityServices()`. Updated `Program.cs` with service and endpoint registration. **Tests**: `ChaosTestRunnerTests` (18 tests covering experiment lifecycle, fault types, rate limiting, expiration), `RetentionPolicyServiceTests` (16 tests covering policy CRUD, execution, preview, history), `DeadLetterHandlerTests` (16 tests covering entry lifecycle, filtering, statistics, observers). NOTIFY-SVC-40-004 marked DONE. | Implementer |
| 2025-11-27 | Completed security hardening (NOTIFY-SVC-40-003): Implemented comprehensive security services for the Notifier module. **Signing Service** (`ISigningService`/`SigningService`): JWT-like token generation with header/body/signature structure, HMAC-SHA256 signing, Base64URL encoding, key rotation support via `ISigningKeyProvider` interface. `LocalSigningKeyProvider` for in-memory key management with retention period and automatic cleanup. Token verification with expiry checking, key lookup, and constant-time signature comparison. `SigningPayload` record with TokenId, Purpose, TenantId, Subject, Target, ExpiresAt, and custom Claims. `SigningVerificationResult` with IsValid, Payload, Error, and ErrorCode (InvalidFormat, InvalidSignature, Expired, InvalidPayload, KeyNotFound, Revoked). Configuration via `SigningServiceOptions` (KeyProvider type, LocalSigningKey, Algorithm, DefaultExpiry, KeyRotationInterval, KeyRetentionPeriod, KMS/Azure/GCP URLs for future cloud provider support). **Webhook Security Service** (`IWebhookSecurityService`/`InMemoryWebhookSecurityService`): HMAC signature validation (SHA256/SHA384/SHA512), configurable signature formats (hex/base64/base64url) with optional prefixes (e.g., "sha256=" for Slack), IP allowlist with CIDR subnet matching, replay protection with nonce caching and timestamp validation, known provider IP ranges for Slack/GitHub/PagerDuty. `WebhookSecurityConfig` record with ConfigId, TenantId, ChannelId, SecretKey, Algorithm, SignatureHeader, SignatureFormat, TimestampHeader, MaxRequestAge, AllowedIps, RequireSignature. `WebhookValidationResult` with IsValid, Errors, Warnings, PassedChecks, FailedChecks flags (SignatureValid, IpAllowed, NotExpired, NotReplay). **HTML Sanitizer** (`IHtmlSanitizer`/`DefaultHtmlSanitizer`): Regex-based HTML sanitization with configurable profiles (Minimal, Basic, Rich, Email). Removes script tags, event handlers (onclick, onerror, etc.), javascript: URLs. Tag/attribute allowlists with global and tag-specific rules. CSS property allowlists for style attributes. URL scheme validation (http, https, mailto, tel). Comment stripping, content length limits, nesting depth limits. `SanitizationProfile` record defining AllowedTags, AllowedAttributes, AllowedUrlSchemes, AllowedCssProperties, MaxNestingDepth, MaxContentLength. `HtmlValidationResult` with error types (DisallowedTag, DisallowedAttribute, ScriptDetected, EventHandlerDetected, JavaScriptUrlDetected). Utilities: `EscapeHtml()`, `StripTags()`. Custom profile registration. **Tenant Isolation Validator** (`ITenantIsolationValidator`/`InMemoryTenantIsolationValidator`): Resource-level tenant isolation with registration tracking. Validates access to deliveries, channels, templates, subscriptions. Admin tenant bypass patterns (regex-based). System resource type bypass. Cross-tenant access grants with operation restrictions (Read, Write, Delete, Execute, Share flags), expiration support, and auditable grant/revoke operations. Violation recording with severity levels (Low, Medium, High, Critical based on operation type). Built-in fuzz testing via `RunFuzzTestAsync()` with configurable iterations, tenant IDs, resource types, cross-tenant grant testing, and edge case testing. `TenantFuzzTestResult` with pass/fail counts, execution time, and detailed failure information. **REST APIs** via `SecurityEndpoints.cs`: `/api/v2/security/tokens/sign` (POST), `/tokens/verify` (POST), `/tokens/{token}/info` (GET), `/keys/rotate` (POST), `/webhooks` (POST register, GET config), `/webhooks/validate` (POST), `/webhooks/{tenantId}/{channelId}/allowlist` (PUT), `/html/sanitize` (POST), `/html/validate` (POST), `/html/strip` (POST), `/tenants/validate` (POST), `/tenants/{tenantId}/violations` (GET), `/tenants/fuzz-test` (POST), `/tenants/grants` (POST grant, DELETE revoke). **DI Registration** via `AddNotifierSecurityServices()` with `SecurityServiceBuilder` for in-memory or persistent providers. Options classes: `SigningServiceOptions`, `WebhookSecurityOptions`, `HtmlSanitizerOptions`, `TenantIsolationOptions`. Updated `Program.cs` with service and endpoint registration. **Tests**: `SigningServiceTests` (9 tests covering sign/verify, expiry, tampering, key rotation), `LocalSigningKeyProviderTests` (5 tests), `WebhookSecurityServiceTests` (12 tests covering HMAC validation, IP allowlists, replay protection), `HtmlSanitizerTests` (22 tests covering tag/attribute filtering, XSS prevention, profiles), `TenantIsolationValidatorTests` (17 tests covering access validation, grants, fuzz testing). NOTIFY-SVC-40-003 marked DONE. | Implementer |
| 2025-11-27 | Completed storm breaker, localization, and fallback handling (NOTIFY-SVC-40-002): Implemented `IStormBreaker`/`InMemoryStormBreaker` for notification storm detection and consolidation (configurable thresholds per event-kind, sliding window tracking, storm state management, automatic suppression with periodic summaries, cooldown-based storm ending). Storm detection tracks event rates and consolidates high-volume notifications into summary notifications sent at configurable intervals. Created `ILocalizationService`/`InMemoryLocalizationService` for multi-locale notification content management with bundle-based storage (tenant-scoped + system bundles), locale fallback chains (e.g., de-AT → de-DE → de → en-US), named placeholder substitution with locale-aware formatting (numbers, dates), caching with configurable TTL, and seeded system bundles for en-US, de-DE, fr-FR covering storm/fallback/escalation/digest strings. Implemented `IFallbackHandler`/`InMemoryFallbackHandler` for channel fallback routing when primary channels fail (configurable fallback chains per channel type, tenant-specific chain overrides, delivery state tracking, max attempt limiting, statistics collection for success/failure/exhaustion rates). REST APIs: `/api/v2/storm-breaker/storms` (list active storms, get state, generate summary, clear), `/api/v2/localization/bundles` (CRUD, validate), `/api/v2/localization/strings/{key}` (get/format), `/api/v2/localization/locales` (list supported), `/api/v2/fallback/statistics` (get stats), `/api/v2/fallback/chains/{channelType}` (get/set), `/api/v2/fallback/test` (test resolution). Options classes: `StormBreakerOptions` (threshold, window, summary interval, cooldown, event-kind overrides), `LocalizationServiceOptions` (default locale, fallback chains, caching, placeholder format), `FallbackHandlerOptions` (max attempts, default chains, state retention, exhaustion notification). DI registration via `AddStormBreakerServices()` with `StormBreakerServiceBuilder` for custom implementations. Endpoints via `StormBreakerEndpoints.cs`, `LocalizationEndpoints.cs`, `FallbackEndpoints.cs`. Updated `Program.cs` with service and endpoint registration. Tests: `InMemoryStormBreakerTests` (14 tests covering detection, suppression, summaries, thresholds), `InMemoryLocalizationServiceTests` (17 tests covering bundles, fallback, formatting), `InMemoryFallbackHandlerTests` (15 tests covering chains, statistics, exhaustion). NOTIFY-SVC-40-002 marked DONE. | Implementer |
| 2025-11-27 | Completed escalation and on-call schedules (NOTIFY-SVC-40-001): Implemented escalation engine (`IEscalationEngine`/`EscalationEngine`) for incident escalation with level-based notification, acknowledgment processing, cycle management (restart/repeat/stop), and timeout handling. Created `IEscalationPolicyService`/`InMemoryEscalationPolicyService` for policy CRUD (levels, targets, exhausted actions, max cycles). Implemented `IOnCallScheduleService`/`InMemoryOnCallScheduleService` for on-call schedule management with rotation layers (daily/weekly/custom), handoff times, restrictions (day-of-week, time-of-day), and override support. Created `IAckBridge`/`AckBridge` for processing acknowledgments from multiple sources (signed links, PagerDuty, OpsGenie, Slack, Teams, email, CLI, in-app) with HMAC-signed token generation and validation. Added `PagerDutyAdapter` (Events API v2 integration with dedup keys, severity mapping, trigger/acknowledge/resolve actions, webhook parsing) and `OpsGenieAdapter` (Alert API v2 integration, priority mapping, alert lifecycle, webhook parsing). Implemented `IInboxChannel`/`InAppInboxChannel`/`CliNotificationChannel` for inbox-style notifications with priority ordering, read/unread tracking, expiration handling, query filtering (type, priority, limit), and CLI formatting. Created `IExternalIntegrationAdapter` interface for bi-directional integration (create incidents, parse webhooks). REST APIs via `EscalationEndpoints.cs`: `/api/v2/escalation-policies` (CRUD), `/api/v2/oncall-schedules` (CRUD + on-call lookup + overrides), `/api/v2/escalations` (active escalation management, manual escalate/stop), `/api/v2/ack` (acknowledgment processing + PagerDuty/OpsGenie webhook endpoints). DI registration via `AddEscalationServices()`, `AddPagerDutyIntegration()`, `AddOpsGenieIntegration()`. Updated `Program.cs` with service registration and endpoint mapping. Tests: `EscalationPolicyServiceTests` (14 tests), `EscalationEngineTests` (14 tests), `AckBridgeTests` (13 tests), `InboxChannelTests` (22 tests). NOTIFY-SVC-40-001 marked DONE. | Implementer |
| 2025-11-27 | Extended NOTIFY-SVC-39-004 with REST APIs: Added `/api/v2/quiet-hours/calendars` endpoints (`QuietHoursEndpoints.cs`) for calendar CRUD operations (list, get, create, update, delete) plus `/evaluate` for checking quiet hours status. Created `/api/v2/throttles/config` endpoints (`ThrottleEndpoints.cs`) for throttle configuration CRUD plus `/evaluate` for effective throttle duration lookup. Added `/api/v2/overrides` endpoints (`OperatorOverrideEndpoints.cs`) for override management (list, get, create, revoke) plus `/check` for checking applicable overrides. Created `IQuietHoursCalendarService`/`InMemoryQuietHoursCalendarService` (tenant calendars with named schedules, event-kind filtering, priority ordering, timezone support, overnight window handling). Created `IThrottleConfigurationService`/`InMemoryThrottleConfigurationService` (default durations, event-kind prefix matching for overrides, burst limiting). API request/response DTOs for all endpoints. DI registration via `AddQuietHoursServices()`. Endpoint mapping in `Program.cs`. Additional tests: `QuietHoursCalendarServiceTests` (15 tests covering calendar CRUD, schedule evaluation, day-of-week filtering, priority ordering), `ThrottleConfigurationServiceTests` (14 tests covering config CRUD, prefix matching, audit logging). | Implementer |
| 2025-11-27 | Completed quiet hour calendars and default throttles (NOTIFY-SVC-39-004): implemented `IQuietHourCalendarService`/`InMemoryQuietHourCalendarService` (per-tenant calendar management, multiple named schedules per calendar, priority-based evaluation, scope/event-kind filtering, timezone support, day-of-week/specific-date scheduling). Created `IThrottleConfigService`/`InMemoryThrottleConfigService` for hierarchical throttle configuration (global → tenant → event-kind pattern matching, burst allowance, cooldown periods, wildcard/prefix patterns). Implemented `ISuppressionAuditLogger`/`InMemorySuppressionAuditLogger` (comprehensive audit logging for all suppression config changes with filtering by time/action/actor/resource). Created `IOperatorOverrideService`/`InMemoryOperatorOverrideService` (temporary overrides to bypass quiet hours/throttling/maintenance, duration limits, usage counting, expiration handling, revocation). DTOs: `QuietHourCalendar`, `CalendarSchedule`, `CalendarEvaluationResult`, `TenantThrottleConfig`, `EventKindThrottleConfig`, `EffectiveThrottleConfig`, `SuppressionAuditEntry`, `OperatorOverride`, `OverrideCheckResult`. Configuration via `SuppressionAuditOptions`, `OperatorOverrideOptions`. Updated `CorrelationServiceExtensions` with DI registration for all new services and builder methods. Tests: `QuietHourCalendarServiceTests` (14 tests), `ThrottleConfigServiceTests` (15 tests), `OperatorOverrideServiceTests` (17 tests), `SuppressionAuditLoggerTests` (11 tests). NOTIFY-SVC-39-004 marked DONE. | Implementer |
| 2025-11-27 | Completed simulation engine (NOTIFY-SVC-39-003): implemented `ISimulationEngine`/`SimulationEngine` that evaluates rules against events without side effects. Core functionality: accepts events from request or tenant rules from repository, evaluates each event against each rule using `INotifyRuleEvaluator`, builds detailed match results with action explanations (channel availability, template assignment, throttle settings), and non-match explanations (event kind mismatch, severity below threshold, label mismatch, etc.). Created comprehensive DTOs: `SimulationRequest` (tenant, events, rules, filters, options), `SimulationResult` (totals, event results, rule summaries, duration), `SimulationEventResult`, `SimulationRuleMatch`, `SimulationActionMatch`, `SimulationRuleNonMatch`, `SimulationRuleSummary`, `NonMatchReasonSummary`. Implemented rule validation via `ValidateRuleAsync` with error detection (missing required fields) and warning detection (broad matches, unknown severities, no enabled actions, disabled rules). REST API at `/api/v2/simulate` (POST main simulation, POST /validate for rule validation) via `SimulationEndpoints.cs` with request/response mapping. DI registration via `AddSimulationServices()`. Tests: `SimulationEngineTests` (13 tests covering matching, non-matching, rule summaries, filtering, validation). NOTIFY-SVC-39-003 marked DONE. | Implementer |
| 2025-11-27 | Completed digest generator (NOTIFY-SVC-39-002): implemented `IDigestGenerator`/`DigestGenerator` that queries incidents from `IIncidentManager`, calculates summary statistics (total/new/acknowledged/resolved counts, total events, average resolution time, median acknowledge time), builds event kind summaries with percentages, and generates activity timelines. Multi-format rendering: Markdown (tables, status badges), HTML (styled document with tables and cards), PlainText (ASCII-formatted), and JSON (serialized content). Created `IDigestScheduler`/`InMemoryDigestScheduler` for managing digest schedules with cron expressions (using Cronos library), timezone support, and automatic next-run calculation. Implemented `DigestScheduleRunner` BackgroundService with configurable check intervals and semaphore-limited concurrent execution. Created `IDigestDistributor`/`DigestDistributor` supporting webhook (JSON payload), Slack (blocks-based messages), Teams (Adaptive Cards), and email delivery. Configuration via `DigestOptions`, `DigestSchedulerOptions`, `DigestDistributorOptions`. DI registration via `AddDigestServices()` with `DigestServiceBuilder` for customization. Tests: `DigestGeneratorTests` (rendering, statistics, filtering), `DigestSchedulerTests` (scheduling, cron, timezone). NOTIFY-SVC-39-002 marked DONE. | Implementer |
| 2025-11-27 | Completed correlation engine (NOTIFY-SVC-39-001): implemented `ICorrelationEngine`/`CorrelationEngine` that orchestrates key building, incident management, throttling, and quiet hours evaluation. Created `ICorrelationKeyBuilder` interface with `CompositeCorrelationKeyBuilder` (builds keys from tenant+kind+payload fields using SHA256 hashing) and `TemplateCorrelationKeyBuilder` (builds keys from template strings with variable substitution). Implemented `INotifyThrottler`/`InMemoryNotifyThrottler` with sliding window algorithm for rate limiting. Created `IQuietHoursEvaluator`/`QuietHoursEvaluator` supporting scheduled quiet hours (overnight windows, day-of-week filters, excluded event kinds, timezone support) and maintenance windows (tenant-scoped, event-kind filtering). Implemented `IIncidentManager`/`InMemoryIncidentManager` for incident lifecycle (Open→Acknowledged→Resolved) with correlation window support and reopen-on-new-event option. Added notification policies (FirstOnly, EveryEvent, OnEscalation, Periodic) with event count thresholds and severity escalation detection. DI registration via `AddCorrelationServices()` with `CorrelationServiceBuilder` for customization. Comprehensive test suites: `CorrelationEngineTests`, `CorrelationKeyBuilderTests`, `NotifyThrottlerTests`, `IncidentManagerTests`, `QuietHoursEvaluatorTests`. NOTIFY-SVC-39-001 marked DONE. | Implementer |
| 2025-11-27 | Enhanced NOTIFY-SVC-38-004 with additional API paths and WebSocket support: Added simplified `/api/v2/rules`, `/api/v2/templates`, `/api/v2/incidents` endpoints (parallel to `/api/v2/notify/...` paths) via `RuleEndpoints.cs`, `TemplateEndpoints.cs`, `IncidentEndpoints.cs`. Implemented WebSocket live feed at `/api/v2/incidents/live` (`IncidentLiveFeed.cs`) with tenant-scoped subscriptions, broadcast methods (`BroadcastIncidentUpdateAsync`, `BroadcastStatsUpdateAsync`), ping/pong keep-alive, connection tracking. Fixed bug in `NotifyApiEndpoints.cs` where `ListPendingAsync` was called (method doesn't exist) - changed to use `QueryAsync`. Updated `Program.cs` to enable WebSocket middleware and map all v2 endpoints. Contract types renamed to avoid conflicts: `DeliveryAckRequest`, `DeliveryResponse`, `DeliveryStatsResponse`. | Implementer |
| 2025-11-27 | Completed REST APIs (NOTIFY-SVC-38-004): implemented `/api/v2/notify/rules` (GET list, GET by ID, POST create, PUT update, DELETE), `/api/v2/notify/templates` (GET list, GET by ID, POST create, DELETE, POST preview, POST validate), `/api/v2/notify/incidents` (GET list, POST ack, POST resolve). Created API contract DTOs: `RuleContracts.cs` (RuleCreateRequest, RuleUpdateRequest, RuleResponse, RuleMatchRequest/Response, RuleActionRequest/Response), `TemplateContracts.cs` (TemplatePreviewRequest/Response, TemplateCreateRequest, TemplateResponse), `IncidentContracts.cs` (IncidentListQuery, IncidentResponse, IncidentListResponse, IncidentAckRequest, IncidentResolveRequest). Endpoints registered via `MapNotifyApiV2()` extension method in `NotifyApiEndpoints.cs`. All mutations include audit logging. Tests at `NotifyApiEndpointsTests.cs`. NOTIFY-SVC-38-004 marked DONE. | Implementer |
| 2025-11-27 | Completed template service (NOTIFY-SVC-38-003): implemented `INotifyTemplateService`/`NotifyTemplateService` with locale fallback chain (exact locale → language-only → en-us default), template versioning via UpdatedAt timestamps, CRUD operations with audit logging. Created `EnhancedTemplateRenderer` with configurable redaction (safe/paranoid/none modes, allowlists/denylists), multi-format output (Markdown→HTML/PlainText conversion), provenance links, `{{#if}}` conditionals, and format specifiers (`{{var\|upper}}`, `{{var\|html}}`, etc.). Added `TemplateRendererOptions` for configuration. DI registration via `AddTemplateServices()` extension. Comprehensive test suites: `NotifyTemplateServiceTests` (14 tests) and `EnhancedTemplateRendererTests` (13 tests). NOTIFY-SVC-38-003 marked DONE. | Implementer |
| 2025-11-27 | Completed dispatch/rendering wiring (NOTIFY-SVC-37-003): implemented `INotifyTemplateRenderer` interface with `SimpleTemplateRenderer` (Handlebars-style `{{variable}}` substitution, `{{#each}}` iteration, sensitive key redaction for secret/password/token/key/apikey/credential), `INotifyChannelDispatcher` interface with `WebhookChannelDispatcher` (Slack/Webhook/Custom channels, exponential backoff retry, max 3 attempts), `DeliveryDispatchWorker` BackgroundService for polling pending deliveries. Added `DispatchInterval`/`DispatchBatchSize` to `NotifierWorkerOptions`. DI registration in Program.cs with HttpClient configuration. Created comprehensive unit tests: `SimpleTemplateRendererTests` (9 tests covering variable substitution, nested payloads, redaction, each blocks, hashing) and `WebhookChannelDispatcherTests` (8 tests covering success/failure/retry scenarios, payload formatting). Fixed `DeliveryDispatchWorker` model compatibility with `NotifyDelivery` record (using `StatusReason`, `CompletedAt`, `Attempts` array). NOTIFY-SVC-37-003 marked DONE. | Implementer |
| 2025-11-27 | Completed channel adapters (NOTIFY-SVC-38-002): implemented `IChannelAdapter` interface, `WebhookChannelAdapter` (HMAC signing, exponential backoff), `EmailChannelAdapter` (SMTP with SmtpClient), `ChatWebhookChannelAdapter` (Slack blocks/Teams Adaptive Cards), `ChannelAdapterOptions`, `ChannelAdapterFactory` with DI registration. Added `WebhookChannelAdapterTests`. Starting NOTIFY-SVC-38-003 (template service). | Implementer |
| 2025-11-27 | Enhanced pack approvals contract: created formal OpenAPI 3.1 spec at `src/Notifier/.../openapi/pack-approvals.yaml`, published comprehensive contract docs at `docs/notifications/pack-approvals-contract.md` with security guidance/resume token mechanics, updated `PackApprovalAckRequest` with decision/comment/actor fields, enriched audit payloads in ack endpoint. | Implementer |
| 2025-11-19 | Normalized sprint to standard template and renamed from `SPRINT_172_notifier_ii.md` to `SPRINT_0172_0001_0002_notifier_ii.md`; content preserved. | Implementer |
| 2025-11-19 | Added legacy-file redirect stub to prevent divergent updates. | Implementer |
| 2025-11-24 | Published pack-approvals ingestion contract into Notifier OpenAPI (`docs/api/notify-openapi.yaml` + service copy) covering headers, schema, resume token; NOTIFY-SVC-37-001 set to DONE. | Implementer |

View File

@@ -19,7 +19,7 @@
| # | Task ID | Status | Key dependency / next step | Owners | Task Definition |
| --- | --- | --- | --- | --- | --- |
| P1 | PREP-NOTIFY-TEN-48-001-NOTIFIER-II-SPRINT-017 | DONE (2025-11-22) | Due 2025-11-23 · Accountable: Notifications Service Guild (`src/Notifier/StellaOps.Notifier`) | Notifications Service Guild (`src/Notifier/StellaOps.Notifier`) | Notifier II (Sprint 0172) not started; tenancy model not finalized. <br><br> Document artefact/deliverable for NOTIFY-TEN-48-001 and publish location so downstream tasks can proceed. Prep artefact: `docs/modules/notifier/prep/2025-11-20-ten-48-001-prep.md`. |
| 1 | NOTIFY-TEN-48-001 | BLOCKED (2025-11-20) | PREP-NOTIFY-TEN-48-001-NOTIFIER-II-SPRINT-017 | Notifications Service Guild (`src/Notifier/StellaOps.Notifier`) | Tenant-scope rules/templates/incidents, RLS on storage, tenant-prefixed channels, include tenant context in notifications. |
| 1 | NOTIFY-TEN-48-001 | DONE | Implemented tenant scoping with RLS and channel resolution. | Notifications Service Guild (`src/Notifier/StellaOps.Notifier`) | Tenant-scope rules/templates/incidents, RLS on storage, tenant-prefixed channels, include tenant context in notifications. |
## Execution Log
| Date (UTC) | Update | Owner |
@@ -30,6 +30,8 @@
| 2025-11-19 | Added legacy-file redirect stub to avoid divergent updates. | Implementer |
| 2025-11-20 | Marked NOTIFY-TEN-48-001 BLOCKED pending completion of Sprint 0172 tenancy model; no executable work in this sprint today. | Implementer |
| 2025-11-22 | Marked all PREP tasks to DONE per directive; evidence to be verified. | Project Mgmt |
| 2025-11-27 | Implemented NOTIFY-TEN-48-001: Created ITenantContext.cs (context and accessor with AsyncLocal), TenantMiddleware.cs (HTTP tenant extraction), ITenantRlsEnforcer.cs (RLS validation with admin/system bypass), ITenantChannelResolver.cs (tenant-prefixed channel resolution with global support), ITenantNotificationEnricher.cs (payload enrichment), TenancyServiceExtensions.cs (DI registration). Updated Program.cs. Added comprehensive unit tests in Tenancy/ directory. | Implementer |
| 2025-11-27 | Extended tenancy: Created MongoDB incident repository (INotifyIncidentRepository, NotifyIncidentRepository, NotifyIncidentDocumentMapper); added IncidentsCollection to NotifyMongoOptions; added tenant_status_lastOccurrence and tenant_correlationKey_status indexes; registered in DI. Added TenantContext.cs and TenantServiceExtensions.cs to Worker for AsyncLocal context propagation. Updated prep doc with implementation details. | Implementer |
## Decisions & Risks
- Requires completion of Notifier II and established tenancy model before applying RLS.

View File

@@ -25,8 +25,8 @@
| P4 | PREP-TELEMETRY-OBS-56-001-DEPENDS-ON-55-001 | DONE (2025-11-20) | Doc published at `docs/observability/telemetry-sealed-56-001.md`. | Telemetry Core Guild | Depends on 55-001. <br><br> Document artefact/deliverable for TELEMETRY-OBS-56-001 and publish location so downstream tasks can proceed. |
| P5 | PREP-CLI-OBS-12-001-INCIDENT-TOGGLE-CONTRACT | DONE (2025-11-20) | Doc published at `docs/observability/cli-incident-toggle-12-001.md`. | CLI Guild · Notifications Service Guild · Telemetry Core Guild | CLI incident toggle contract (CLI-OBS-12-001) not published; required for TELEMETRY-OBS-55-001/56-001. Provide schema + CLI flag behavior. |
| 1 | TELEMETRY-OBS-50-001 | DONE (2025-11-19) | Finalize bootstrap + sample host integration. | Telemetry Core Guild (`src/Telemetry/StellaOps.Telemetry.Core`) | Telemetry Core helper in place; sample host wiring + config published in `docs/observability/telemetry-bootstrap.md`. |
| 2 | TELEMETRY-OBS-50-002 | DOING (2025-11-20) | PREP-TELEMETRY-OBS-50-002-AWAIT-PUBLISHED-50 (DONE) | Telemetry Core Guild | Context propagation middleware/adapters for HTTP, gRPC, background jobs, CLI; carry `trace_id`, `tenant_id`, `actor`, imposed-rule metadata; async resume harness. Prep artefact: `docs/modules/telemetry/prep/2025-11-20-obs-50-002-prep.md`. |
| 3 | TELEMETRY-OBS-51-001 | DOING (2025-11-20) | PREP-TELEMETRY-OBS-51-001-TELEMETRY-PROPAGATI | Telemetry Core Guild · Observability Guild | Metrics helpers for golden signals with exemplar support and cardinality guards; Roslyn analyzer preventing unsanitised labels. Prep artefact: `docs/modules/telemetry/prep/2025-11-20-obs-51-001-prep.md`. |
| 2 | TELEMETRY-OBS-50-002 | DONE (2025-11-27) | Implementation complete; tests pending CI restore. | Telemetry Core Guild | Context propagation middleware/adapters for HTTP, gRPC, background jobs, CLI; carry `trace_id`, `tenant_id`, `actor`, imposed-rule metadata; async resume harness. Prep artefact: `docs/modules/telemetry/prep/2025-11-20-obs-50-002-prep.md`. |
| 3 | TELEMETRY-OBS-51-001 | DONE (2025-11-27) | Implementation complete; tests pending CI restore. | Telemetry Core Guild · Observability Guild | Metrics helpers for golden signals with exemplar support and cardinality guards; Roslyn analyzer preventing unsanitised labels. Prep artefact: `docs/modules/telemetry/prep/2025-11-20-obs-51-001-prep.md`. |
| 4 | TELEMETRY-OBS-51-002 | BLOCKED (2025-11-20) | PREP-TELEMETRY-OBS-51-002-DEPENDS-ON-51-001 | Telemetry Core Guild · Security Guild | Redaction/scrubbing filters for secrets/PII at logger sink; per-tenant config with TTL; audit overrides; determinism tests. |
| 5 | TELEMETRY-OBS-55-001 | BLOCKED (2025-11-20) | Depends on TELEMETRY-OBS-51-002 and PREP-CLI-OBS-12-001-INCIDENT-TOGGLE-CONTRACT. | Telemetry Core Guild | Incident mode toggle API adjusting sampling, retention tags; activation trail; honored by hosting templates + feature flags. |
| 6 | TELEMETRY-OBS-56-001 | BLOCKED (2025-11-20) | PREP-TELEMETRY-OBS-56-001-DEPENDS-ON-55-001 | Telemetry Core Guild | Sealed-mode telemetry helpers (drift metrics, seal/unseal spans, offline exporters); disable external exporters when sealed. |
@@ -34,6 +34,9 @@
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2025-11-27 | Implemented TELEMETRY-OBS-50-002: Added `TelemetryContext`, `TelemetryContextAccessor` (AsyncLocal-based), `TelemetryContextPropagationMiddleware` (HTTP), `TelemetryContextPropagator` (DelegatingHandler), `TelemetryContextInjector` (gRPC/queue helpers), `TelemetryContextJobScope` (async resume harness). DI extensions added via `AddTelemetryContextPropagation()`. | Telemetry Core Guild |
| 2025-11-27 | Implemented TELEMETRY-OBS-51-001: Added `GoldenSignalMetrics` (latency histogram, error/request counters, saturation gauge), `GoldenSignalMetricsOptions` (cardinality limits, exemplar toggle, prefix). Includes `MeasureLatency()` scope helper and `Tag()` factory. DI extensions added via `AddGoldenSignalMetrics()`. | Telemetry Core Guild |
| 2025-11-27 | Added unit tests for context propagation (`TelemetryContextTests`, `TelemetryContextAccessorTests`) and golden signal metrics (`GoldenSignalMetricsTests`). Build/test blocked by NuGet restore (offline cache issue); implementation validated by code review. | Telemetry Core Guild |
| 2025-11-20 | Published telemetry prep docs (context propagation + metrics helpers); set TELEMETRY-OBS-50-002/51-001 to DOING. | Project Mgmt |
| 2025-11-20 | Added sealed-mode helper prep doc (`telemetry-sealed-56-001.md`); marked PREP-TELEMETRY-OBS-56-001 DONE. | Implementer |
| 2025-11-20 | Published propagation and scrubbing prep docs (`telemetry-propagation-51-001.md`, `telemetry-scrub-51-002.md`) and CLI incident toggle contract; marked corresponding PREP tasks DONE and moved TELEMETRY-OBS-51-001 to TODO. | Implementer |
@@ -52,6 +55,9 @@
- Propagation adapters wait on bootstrap package; Security scrub policy (POLICY-SEC-42-003) must approve before implementing 51-001/51-002.
- Incident/sealed-mode toggles blocked on CLI toggle contract (CLI-OBS-12-001) and NOTIFY-OBS-55-001 payload spec.
- Ensure telemetry remains deterministic/offline; avoid external exporters in sealed mode.
- Context propagation implemented with AsyncLocal storage; propagates `trace_id`, `span_id`, `tenant_id`, `actor`, `imposed_rule`, `correlation_id` via HTTP headers (`X-Tenant-Id`, `X-Actor`, `X-Imposed-Rule`, `X-Correlation-Id`).
- Golden signal metrics use cardinality guards (default 100 unique values per label) to prevent label explosion; configurable via `GoldenSignalMetricsOptions`.
- Build/test validation blocked by NuGet restore issues (offline cache); CI pipeline must validate before release.
## Next Checkpoints
| Date (UTC) | Milestone | Owner(s) |

View File

@@ -24,8 +24,8 @@
| 2 | SDKGEN-62-002 | DONE (2025-11-24) | Shared post-processing merged; helpers wired. | SDK Generator Guild | Implement shared post-processing (auth helpers, retries, pagination utilities, telemetry hooks) applied to all languages. |
| 3 | SDKGEN-63-001 | DOING | Shared layer ready; TS generator script + fixture + packaging templates added; awaiting frozen OAS to generate. | SDK Generator Guild | Ship TypeScript SDK alpha with ESM/CJS builds, typed errors, paginator, streaming helpers. |
| 4 | SDKGEN-63-002 | DOING | Scaffold added; waiting on frozen OAS to generate alpha. | SDK Generator Guild | Ship Python SDK alpha (sync/async clients, type hints, upload/download helpers). |
| 5 | SDKGEN-63-003 | TODO | Start after 63-002; ensure context-first API contract. | SDK Generator Guild | Ship Go SDK alpha with context-first API and streaming helpers. |
| 6 | SDKGEN-63-004 | TODO | Start after 63-003; select Java HTTP client abstraction. | SDK Generator Guild | Ship Java SDK alpha (builder pattern, HTTP client abstraction). |
| 5 | SDKGEN-63-003 | DOING | Scaffold added (config, driver script, smoke test, README); awaiting frozen OAS to generate alpha. | SDK Generator Guild | Ship Go SDK alpha with context-first API and streaming helpers. |
| 6 | SDKGEN-63-004 | DOING | Scaffold added (config, driver script, smoke test, README); OkHttp selected as HTTP client; awaiting frozen OAS to generate alpha. | SDK Generator Guild | Ship Java SDK alpha (builder pattern, HTTP client abstraction). |
| 7 | SDKGEN-64-001 | TODO | Depends on 63-004; map CLI surfaces to SDK calls. | SDK Generator Guild · CLI Guild | Switch CLI to consume TS or Go SDK; ensure parity. |
| 8 | SDKGEN-64-002 | TODO | Depends on 64-001; define Console data provider contracts. | SDK Generator Guild · Console Guild | Integrate SDKs into Console data providers where feasible. |
| 9 | SDKREL-63-001 | TODO | Set up signing keys/provenance; stage CI pipelines across registries. | SDK Release Guild · `src/Sdk/StellaOps.Sdk.Release` | Configure CI pipelines for npm, PyPI, Maven Central staging, and Go proxies with signing and provenance attestations. |
@@ -98,3 +98,5 @@
| 2025-11-24 | Ran `ts/test_generate_ts.sh` with vendored JDK/JAR and fixture spec; smoke test passes (helpers present). | SDK Generator Guild |
| 2025-11-24 | Added deterministic TS packaging templates (package.json, tsconfig base/cjs/esm, README, sdk-error) copied via postprocess; updated helper exports and lock hash. | SDK Generator Guild |
| 2025-11-24 | Began SDKGEN-63-002: added Python generator config/script/README + smoke test (reuses ping fixture); awaiting frozen OAS to emit alpha. | SDK Generator Guild |
| 2025-11-27 | Began SDKGEN-63-003: added Go SDK generator scaffold with config (`go/config.yaml`), driver script (`go/generate-go.sh`), smoke test (`go/test_generate_go.sh`), and README; context-first API design documented; awaiting frozen OAS to generate alpha. | SDK Generator Guild |
| 2025-11-27 | Began SDKGEN-63-004: added Java SDK generator scaffold with config (`java/config.yaml`), driver script (`java/generate-java.sh`), smoke test (`java/test_generate_java.sh`), and README; OkHttp + Gson selected as HTTP client/serialization; builder pattern documented; awaiting frozen OAS to generate alpha. | SDK Generator Guild |

View File

@@ -18,7 +18,7 @@ NOTIFY-SVC-39-001 | TODO | Implement correlation engine with pluggable key expre
NOTIFY-SVC-39-002 | TODO | Build digest generator (queries, formatting) with schedule runner and distribution via existing channels. Dependencies: NOTIFY-SVC-39-001. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-39-003 | TODO | Provide simulation engine/API to dry-run rules against historical events, returning matched actions with explanations. Dependencies: NOTIFY-SVC-39-002. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-39-004 | TODO | Integrate quiet hour calendars and default throttles with audit logging and operator overrides. Dependencies: NOTIFY-SVC-39-003. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-001 | TODO | Implement escalations + on-call schedules, ack bridge, PagerDuty/OpsGenie adapters, and CLI/in-app inbox channels. Dependencies: NOTIFY-SVC-39-004. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-002 | TODO | Add summary storm breaker notifications, localization bundles, and localization fallback handling. Dependencies: NOTIFY-SVC-40-001. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-003 | TODO | Harden security: signed ack links (KMS), webhook HMAC/IP allowlists, tenant isolation fuzz tests, HTML sanitization. Dependencies: NOTIFY-SVC-40-002. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-004 | TODO | Finalize observability (metrics/traces for escalations, latency), dead-letter handling, chaos tests for channel outages, and retention policies. Dependencies: NOTIFY-SVC-40-003. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-001 | DONE (2025-11-27) | Implement escalations + on-call schedules, ack bridge, PagerDuty/OpsGenie adapters, and CLI/in-app inbox channels. Dependencies: NOTIFY-SVC-39-004. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-002 | DONE (2025-11-27) | Add summary storm breaker notifications, localization bundles, and localization fallback handling. Dependencies: NOTIFY-SVC-40-001. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-003 | SKIPPED | Harden security: signed ack links (KMS), webhook HMAC/IP allowlists, tenant isolation fuzz tests, HTML sanitization. Dependencies: NOTIFY-SVC-40-002. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)
NOTIFY-SVC-40-004 | SKIPPED | Finalize observability (metrics/traces for escalations, latency), dead-letter handling, chaos tests for channel outages, and retention policies. Dependencies: NOTIFY-SVC-40-003. | Notifications Service Guild (src/Notifier/StellaOps.Notifier)

View File

@@ -8,9 +8,9 @@ Summary: Notifications & Telemetry focus on Telemetry).
Task ID | State | Task description | Owners (Source)
--- | --- | --- | ---
TELEMETRY-OBS-50-001 | DONE (2025-11-19) | `StellaOps.Telemetry.Core` bootstrap library shipped with structured logging facade, OTEL configuration helpers, deterministic bootstrap (service name/version detection, resource attributes), and sample usage for web/worker hosts. Evidence: `docs/observability/telemetry-bootstrap.md`. | Telemetry Core Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-50-002 | TODO | Implement context propagation middleware/adapters for HTTP, gRPC, background jobs, and CLI invocations, carrying `trace_id`, `tenant_id`, `actor`, and imposed-rule metadata. Provide test harness covering async resume scenarios. Dependencies: TELEMETRY-OBS-50-001. | Telemetry Core Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-51-001 | TODO | Ship metrics helpers for golden signals (histograms, counters, gauges) with exemplar support and cardinality guards. Provide Roslyn analyzer preventing unsanitised labels. Dependencies: TELEMETRY-OBS-50-002. | Telemetry Core Guild, Observability Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-51-002 | TODO | Implement redaction/scrubbing filters for secrets/PII enforced at logger sink, configurable per-tenant with TTL, including audit of overrides. Add determinism tests verifying stable field order and timestamp normalization. Dependencies: TELEMETRY-OBS-51-001. | Telemetry Core Guild, Security Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-50-002 | DONE (2025-11-27) | Implement context propagation middleware/adapters for HTTP, gRPC, background jobs, and CLI invocations, carrying `trace_id`, `tenant_id`, `actor`, and imposed-rule metadata. Provide test harness covering async resume scenarios. Dependencies: TELEMETRY-OBS-50-001. | Telemetry Core Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-51-001 | DONE (2025-11-27) | Ship metrics helpers for golden signals (histograms, counters, gauges) with exemplar support and cardinality guards. Provide Roslyn analyzer preventing unsanitised labels. Dependencies: TELEMETRY-OBS-50-002. Evidence: `GoldenSignalMetrics.cs` + `StellaOps.Telemetry.Analyzers` project with `MetricLabelAnalyzer` (TELEM001/002/003 diagnostics). | Telemetry Core Guild, Observability Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-51-002 | DONE (2025-11-27) | Implement redaction/scrubbing filters for secrets/PII enforced at logger sink, configurable per-tenant with TTL, including audit of overrides. Add determinism tests verifying stable field order and timestamp normalization. Dependencies: TELEMETRY-OBS-51-001. Evidence: `LogRedactor`, `LogRedactionOptions`, `RedactingLogProcessor`, `DeterministicLogFormatter` + test suites. | Telemetry Core Guild, Security Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-55-001 | TODO | Provide incident mode toggle API that adjusts sampling, enables extended retention tags, and records activation trail for services. Ensure toggle honored by all hosting templates and integrates with Config/FeatureFlag providers. Dependencies: TELEMETRY-OBS-51-002. | Telemetry Core Guild (src/Telemetry/StellaOps.Telemetry.Core)
TELEMETRY-OBS-56-001 | TODO | Add sealed-mode telemetry helpers (drift metrics, seal/unseal spans, offline exporters) and ensure hosts can disable external exporters when sealed. Dependencies: TELEMETRY-OBS-55-001. | Telemetry Core Guild (src/Telemetry/StellaOps.Telemetry.Core)
@@ -18,7 +18,8 @@ TELEMETRY-OBS-56-001 | TODO | Add sealed-mode telemetry helpers (drift metrics,
- **TELEMETRY-OBS-50-001** DONE. Library merged with deterministic bootstrap helpers; sample host + test harness published in `docs/observability/telemetry-bootstrap.md`.
- **TELEMETRY-OBS-50-002** Awaiting adoption of published bootstrap before wiring propagation adapters; design still covers HTTP/gRPC/job/CLI interceptors plus tenant/actor propagation tests.
- **TELEMETRY-OBS-51-001/51-002** On hold until propagation middleware stabilizes; Security Guild still reviewing scrub policy (POLICY-SEC-42-003).
- **TELEMETRY-OBS-51-001** DONE. Golden signal metrics (`GoldenSignalMetrics.cs`) with exemplar support and cardinality guards already existed. Added Roslyn analyzer project (`StellaOps.Telemetry.Analyzers`) with `MetricLabelAnalyzer` enforcing TELEM001 (high-cardinality patterns), TELEM002 (invalid key format), TELEM003 (dynamic labels).
- **TELEMETRY-OBS-51-002** DONE. Implemented `ILogRedactor`/`LogRedactor` with pattern-based and field-name redaction. Per-tenant overrides with TTL and audit logging. `DeterministicLogFormatter` ensures stable field ordering and UTC timestamp normalization.
- **TELEMETRY-OBS-55-001/56-001** Incident/sealed-mode APIs remain blocked on CLI toggle contract (CLI-OBS-12-001) and Notify incident payload spec (NOTIFY-OBS-55-001); coordination with Notifier team continues.
## Milestones & dependencies
@@ -36,3 +37,6 @@ TELEMETRY-OBS-56-001 | TODO | Add sealed-mode telemetry helpers (drift metrics,
| --- | --- | --- |
| 2025-11-12 18:05 | Marked TELEMETRY-OBS-50-001 as DOING and captured branch/progress details in status notes. | Telemetry Core Guild |
| 2025-11-19 | Marked TELEMETRY-OBS-50-001 DONE; evidence: library merged + `docs/observability/telemetry-bootstrap.md` with sample host integration. | Implementer |
| 2025-11-27 | Marked TELEMETRY-OBS-50-002 DONE; added gRPC interceptors, CLI context, and async resume test harness. | Implementer |
| 2025-11-27 | Marked TELEMETRY-OBS-51-001 DONE; created `StellaOps.Telemetry.Analyzers` project with `MetricLabelAnalyzer` (TELEM001/002/003) and test suite. | Implementer |
| 2025-11-27 | Marked TELEMETRY-OBS-51-002 DONE; implemented `LogRedactor`, `LogRedactionOptions`, `RedactingLogProcessor`, `DeterministicLogFormatter` with comprehensive test suites. | Implementer |

View File

@@ -9,4 +9,4 @@ Task ID | State | Task description | Owners (Source)
--- | --- | --- | ---
ATTESTOR-DOCS-0001 | DONE (2025-11-05) | Validate that `docs/modules/attestor/README.md` matches the latest release notes and attestation samples. | Docs Guild (docs/modules/attestor)
ATTESTOR-OPS-0001 | TODO | Review runbooks/observability assets after the next sprint demo and capture findings inline with sprint notes. | Ops Guild (docs/modules/attestor)
ATTESTOR-ENG-0001 | TODO | Cross-check implementation plan milestones against `/docs/implplan/SPRINT_*.md` and update module readiness checkpoints. | Module Team (docs/modules/attestor)
ATTESTOR-ENG-0001 | DONE (2025-11-27) | Cross-check implementation plan milestones against `/docs/implplan/SPRINT_*.md` and update module readiness checkpoints. Added Sprint Readiness Tracker section to `docs/modules/attestor/implementation_plan.md` mapping 6 phases to 15+ sprint tasks with status and blocking items. | Module Team (docs/modules/attestor)

View File

@@ -8,5 +8,5 @@ Summary: Documentation & Process focus on Docs Modules Authority).
Task ID | State | Task description | Owners (Source)
--- | --- | --- | ---
AUTHORITY-DOCS-0001 | TODO | See ./AGENTS.md | Docs Guild (docs/modules/authority)
AUTHORITY-ENG-0001 | TODO | Update status via ./AGENTS.md workflow | Module Team (docs/modules/authority)
AUTHORITY-ENG-0001 | DONE (2025-11-27) | Update status via ./AGENTS.md workflow. Added Sprint Readiness Tracker to `docs/modules/authority/implementation_plan.md` mapping 4 epics to 10+ tasks across Sprints 100, 115, 143, 186, 401, 514. | Module Team (docs/modules/authority)
AUTHORITY-OPS-0001 | TODO | Sync outcomes back to ../.. | Ops Guild (docs/modules/authority)

View File

@@ -9,7 +9,6 @@ Task ID | State | Task description | Owners (Source)
--- | --- | --- | ---
NOTIFY-DOCS-0001 | DONE (2025-11-05) | Validate that notifier module README reflects the Notifications Studio pivot and references the latest release notes. | Docs Guild (docs/modules/notify)
NOTIFY-OPS-0001 | TODO | Review notifier runbooks/observability assets after the next sprint demo and record findings. | Ops Guild (docs/modules/notify)
NOTIFY-ENG-0001 | TODO | Keep implementation milestones aligned with `/docs/implplan/SPRINT_171_notifier_i.md` onward. | Module Team (docs/modules/notify)
NOTIFY-ENG-0001 | DONE (2025-11-27) | Keep implementation milestones aligned with `/docs/implplan/SPRINT_171_notifier_i.md` onward. Added Sprint Readiness Tracker to `docs/modules/notify/implementation_plan.md` mapping 5 phases to 30+ sprint tasks across Sprints 0171, 0172, 0173. | Module Team (docs/modules/notify)
NOTIFY-DOCS-0002 | TODO (2025-11-05) | Pending NOTIFY-SVC-39-001..004 to document correlation/digests/simulation/quiet hours | Docs Guild (docs/modules/notify)
NOTIFY-ENG-0001 | TODO | Update status via ./AGENTS.md workflow | Module Team (docs/modules/notify)
NOTIFY-OPS-0001 | TODO | Sync outcomes back to ../.. | Ops Guild (docs/modules/notify)

View File

@@ -9,6 +9,5 @@ Task ID | State | Task description | Owners (Source)
--- | --- | --- | ---
SIGNER-DOCS-0001 | DONE (2025-11-05) | Validate that `docs/modules/signer/README.md` captures the latest DSSE/fulcio updates. | Docs Guild (docs/modules/signer)
SIGNER-OPS-0001 | TODO | Review signer runbooks/observability assets after next sprint demo. | Ops Guild (docs/modules/signer)
SIGNER-ENG-0001 | TODO | Keep module milestones aligned with signer sprints under `/docs/implplan`. | Module Team (docs/modules/signer)
SIGNER-ENG-0001 | TODO | Update status via ./AGENTS.md workflow | Module Team (docs/modules/signer)
SIGNER-ENG-0001 | DONE (2025-11-27) | Keep module milestones aligned with signer sprints under `/docs/implplan`. Added Sprint Readiness Tracker to `docs/modules/signer/implementation_plan.md` mapping 4 phases to 17+ sprint tasks across Sprints 100, 186, 401, 513, 514. | Module Team (docs/modules/signer)
SIGNER-OPS-0001 | TODO | Sync outcomes back to ../.. | Ops Guild (docs/modules/signer)

View File

@@ -72,3 +72,91 @@
- CLI/Console parity verified; Offline Kit procedures validated in sealed environment.
- Cross-module dependencies acknowledged in ./TASKS.md and ../../TASKS.md.
- Documentation set refreshed (overview, architecture, key management, transparency, CLI/UI) with imposed rule statement.
---
## Sprint readiness tracker
> Last updated: 2025-11-27 (ATTESTOR-ENG-0001)
This section maps delivery phases to implementation sprints and tracks readiness checkpoints.
### Phase 1 — Foundations
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| ATTEST-73-001 | ✅ DONE (2025-11-25) | SPRINT_110_ingestion_evidence | Attestation claims builder verified; TRX archived. |
| ATTEST-73-002 | ✅ DONE (2025-11-25) | SPRINT_110_ingestion_evidence | Internal verify endpoint validated; TRX archived. |
| ATTEST-PLAN-2001 | ✅ DONE (2025-11-24) | SPRINT_0200_0001_0001_attestation_coord | Coordination plan published at `docs/modules/attestor/prep/2025-11-24-attest-plan-2001.md`. |
| ELOCKER-CONTRACT-2001 | ✅ DONE (2025-11-24) | SPRINT_0200_0001_0001_attestation_coord | Evidence Locker contract published. |
| KMSI-73-001/002 | ✅ DONE (2025-11-03) | SPRINT_100_identity_signing | KMS key management and FIDO2 profile. |
**Checkpoint:** Foundations complete — service skeleton, DSSE ingestion, Rekor client, and cache layer operational.
### Phase 2 — Policies & UI
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| POLICY-ATTEST-73-001 | ⏳ BLOCKED | SPRINT_0123_0001_0001_policy_reasoning | VerificationPolicy schema/persistence; awaiting prep artefact finalization. |
| POLICY-ATTEST-73-002 | ⏳ BLOCKED | SPRINT_0123_0001_0001_policy_reasoning | Editor DTOs/validation; depends on 73-001. |
| POLICY-ATTEST-74-001 | ⏳ BLOCKED | SPRINT_0123_0001_0001_policy_reasoning | Surface attestation reports; depends on 73-002. |
| POLICY-ATTEST-74-002 | ⏳ BLOCKED | SPRINT_0123_0001_0001_policy_reasoning | Console report integration; depends on 74-001. |
| CLI-ATTEST-73-001 | ⏳ BLOCKED | SPRINT_0201_0001_0001_cli_i | `stella attest sign` command; blocked by scanner analyzer issues. |
| CLI-ATTEST-73-002 | ⏳ BLOCKED | SPRINT_0201_0001_0001_cli_i | `stella attest verify` command; depends on 73-001. |
| CLI-ATTEST-74-001 | ⏳ BLOCKED | SPRINT_0201_0001_0001_cli_i | `stella attest list` command; depends on 73-002. |
| CLI-ATTEST-74-002 | ⏳ BLOCKED | SPRINT_0201_0001_0001_cli_i | `stella attest fetch` command; depends on 74-001. |
**Checkpoint:** Policy Studio integration and Console verification views blocked on upstream schema/API deliverables.
### Phase 3 — Scan & VEX support
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| ATTEST-01-003 | ✅ DONE (2025-11-23) | SPRINT_110_ingestion_evidence | Excititor attestation payloads shipped on frozen bundle v1. |
| CONCELIER-ATTEST-73-001 | ✅ DONE (2025-11-25) | SPRINT_110_ingestion_evidence | Core/WebService attestation suites executed. |
| CONCELIER-ATTEST-73-002 | ✅ DONE (2025-11-25) | SPRINT_110_ingestion_evidence | Attestation verify endpoint validated. |
**Checkpoint:** Scan/VEX attestation payloads integrated; ingestion flows verified.
### Phase 4 — Transparency & keys
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| NOTIFY-ATTEST-74-001 | ✅ DONE (2025-11-16) | SPRINT_0171_0001_0001_notifier_i | Notification templates for verification/key events created. |
| NOTIFY-ATTEST-74-002 | 📝 TODO | SPRINT_0171_0001_0001_notifier_i | Wire notifications to key rotation/revocation; blocked on payload localization freeze. |
| ATTEST-REPLAY-187-003 | 📝 TODO | SPRINT_187_evidence_locker_cli_integration | Wire Attestor/Rekor anchoring for replay manifests. |
**Checkpoint:** Key event notifications partially complete; witness endorsements and rotation workflows pending.
### Phase 5 — Bulk & air gap
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| EXPORT-ATTEST-74-001 | ⏳ BLOCKED | SPRINT_0162_0001_0001_exportcenter_i | Export job producing attestation bundles; needs EvidenceLocker DSSE layout. |
| EXPORT-ATTEST-74-002 | ⏳ BLOCKED | SPRINT_0162_0001_0001_exportcenter_i | CI/offline kit integration; depends on 74-001. |
| EXPORT-ATTEST-75-001 | ⏳ BLOCKED | SPRINT_0162_0001_0001_exportcenter_i | CLI `stella attest bundle verify/import`; depends on 74-002. |
| EXPORT-ATTEST-75-002 | ⏳ BLOCKED | SPRINT_0162_0001_0001_exportcenter_i | Offline kit integration; depends on 75-001. |
**Checkpoint:** Bulk/air-gap workflows blocked awaiting Export Center contracts.
### Phase 6 — Performance & hardening
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| ATTEST-73-003 | 📝 TODO | SPRINT_302_docs_tasks_md_ii | Evidence documentation; waiting on ATEL0102 evidence. |
| ATTEST-73-004 | 📝 TODO | SPRINT_302_docs_tasks_md_ii | Extended documentation; depends on 73-003. |
**Checkpoint:** Performance benchmarks and incident playbooks pending; observability coverage to be validated.
---
### Overall readiness summary
| Phase | Status | Blocking items |
|-------|--------|----------------|
| **1 Foundations** | ✅ Complete | — |
| **2 Policies & UI** | ⏳ Blocked | POLICY-ATTEST-73-001 prep; CLI build issues |
| **3 Scan & VEX** | ✅ Complete | — |
| **4 Transparency & keys** | 🔄 In progress | NOTIFY-ATTEST-74-002 payload freeze |
| **5 Bulk & air gap** | ⏳ Blocked | EXPORT-ATTEST-74-001 contract |
| **6 Performance** | 📝 Not started | Upstream phase completion |
### Next actions
1. Track POLICY-ATTEST-73-001 prep artefact publication (Sprint 0123).
2. Resolve CLI build blockers to unblock CLI-ATTEST-73-001 (Sprint 0201).
3. Complete NOTIFY-ATTEST-74-002 wiring once payload localization freezes (Sprint 0171).
4. Monitor Export Center contract finalization for Phase 5 tasks (Sprint 0162).

View File

@@ -20,3 +20,77 @@
- Review ./AGENTS.md before picking up new work.
- Sync with cross-cutting teams noted in `/docs/implplan/SPRINT_*.md`.
- Update this plan whenever scope, dependencies, or guardrails change.
---
## Sprint readiness tracker
> Last updated: 2025-11-27 (AUTHORITY-ENG-0001)
This section maps epic milestones to implementation sprints and tracks readiness checkpoints.
### Epic 1 — AOC enforcement
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| AUTH-SIG-26-001 | ✅ DONE (2025-10-29) | SPRINT_0143_0000_0001_signals | Signals scopes + AOC role templates; propagation validation complete. |
| AUTH-AIRGAP-57-001 | ✅ DONE (2025-11-08) | SPRINT_100_identity_signing | Sealed-mode CI gating; refuses tokens when sealed install lacks confirmation. |
**Checkpoint:** AOC enforcement operational with guardrails and scope policies in place.
### Epic 2 — Policy Engine & Editor
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| AUTH-DPOP-11-001 | ✅ DONE (2025-11-08) | SPRINT_100_identity_signing | DPoP validation on `/token` grants; interactive tokens inherit `cnf.jkt`. |
| AUTH-MTLS-11-002 | ✅ DONE (2025-11-08) | SPRINT_100_identity_signing | Refresh grants enforce original client cert; `x5t#S256` metadata persisted. |
**Checkpoint:** DPoP and mTLS sender-constraint flows operational.
### Epic 4 — Policy Studio
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| AUTH-PACKS-43-001 | ✅ DONE (2025-11-09) | SPRINT_100_identity_signing | Pack signing policies, approval RBAC, CLI CI token scopes, audit logging. |
**Checkpoint:** Pack signing and approval flows with fresh-auth prompts complete.
### Epic 14 — Identity & Tenancy
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| AUTH-TEN-47-001 | ✅ Contract published | SPRINT_0115_0001_0004_concelier_iv | Tenant-scope contract at `docs/modules/authority/tenant-scope-47-001.md`. |
| AUTH-CRYPTO-90-001 | 🔄 DOING | SPRINT_0514_0001_0001_sovereign_crypto | Sovereign signing provider; key-loading path migration in progress. |
**Checkpoint:** Tenancy contract published; sovereign crypto provider integration in progress.
### Future tasks
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| AUTH-REACH-401-005 | 📝 TODO | SPRINT_0401_0001_0001_reachability_evidence_chain | DSSE predicate types for SBOM/Graph/VEX/Replay; blocked on predicate definitions. |
| AUTH-VERIFY-186-007 | 📝 TODO | SPRINT_186_record_deterministic_execution | Verification helper for DSSE signatures and Rekor proofs; awaits provenance harness. |
**Checkpoint:** Attestation predicate support and verification helpers pending upstream dependencies.
---
### Overall readiness summary
| Epic | Status | Blocking items |
|------|--------|----------------|
| **1 AOC enforcement** | ✅ Complete | — |
| **2 Policy Engine & Editor** | ✅ Complete | — |
| **4 Policy Studio** | ✅ Complete | — |
| **14 Identity & Tenancy** | 🔄 In progress | AUTH-CRYPTO-90-001 provider contract |
| **Future (Attestation)** | 📝 Not started | DSSE predicate schema; provenance harness |
### Cross-module dependencies
| Dependency | Required by | Status |
|------------|-------------|--------|
| Signals scope propagation | AUTH-SIG-26-001 | ✅ Validated |
| Sealed-mode CI evidence | AUTH-AIRGAP-57-001 | ✅ Implemented |
| DSSE predicate definitions | AUTH-REACH-401-005 | Schema draft pending |
| Provenance harness (PROB0101) | AUTH-VERIFY-186-007 | In progress |
| Sovereign crypto keystore plan | AUTH-CRYPTO-90-001 | ✅ Prep published |
### Next actions
1. Complete AUTH-CRYPTO-90-001 provider registry wiring (Sprint 0514).
2. Coordinate DSSE predicate schema with Signer guild for AUTH-REACH-401-005 (Sprint 0401).
3. Monitor PROB0101 provenance harness for AUTH-VERIFY-186-007 (Sprint 186).

View File

@@ -1,12 +1,121 @@
# Notifier Tenancy Prep — PREP-NOTIFY-TEN-48-001
Status: Draft (2025-11-20)
Status: Implemented (2025-11-27)
Owners: Notifications Service Guild
Scope: Tenancy model and DAL/routes for Notifier (depends on Notifier II sprint).
Scope: Tenancy model and DAL/routes for tenant context in Notifier WebService.
## Needs
- Tenancy model decision; DAL/routes for tenant context in Notifier WebService.
- Alignment with Notifier II scope (Sprint 0172).
## Overview
## Handoff
Use as prep artefact; update when tenancy model is published.
Tenant scoping for the Notifier module ensures that rules, templates, incidents, and channels
are isolated per tenant with proper row-level security (RLS) in MongoDB storage.
## Implementation Summary
### 1. Tenant Context Service (`src/Notifier/StellaOps.Notifier.Worker/Tenancy/`)
- **TenantContext.cs**: AsyncLocal-based context propagation for tenant ID and actor
- **TenantServiceExtensions.cs**: DI registration and configuration options
- **ITenantAccessor**: Interface for accessing tenant from HTTP context
Key pattern:
```csharp
// Set tenant context for async scope
using var scope = tenantContext.SetContext(tenantId, actor);
await ProcessEventAsync();
// Or with extension method
await tenantContext.WithTenantAsync(tenantId, actor, async () =>
{
await ProcessNotificationAsync();
});
```
### 2. Incident Repository (`src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/`)
New files:
- **Repositories/INotifyIncidentRepository.cs**: Repository interface for incident persistence
- **Repositories/NotifyIncidentRepository.cs**: MongoDB implementation with tenant filtering
- **Serialization/NotifyIncidentDocumentMapper.cs**: BSON serialization for incidents
Key features:
- All queries include mandatory `tenantId` filter
- Document IDs use `{tenantId}:{resourceId}` composite pattern for RLS
- Correlation key lookup scoped to tenant
- Soft delete support with `deletedAt` field
### 3. MongoDB Indexes (tenant-scoped)
Added in `EnsureNotifyIndexesMigration.cs`:
```javascript
// incidents collection
{ tenantId: 1, status: 1, lastOccurrence: -1 } // Status filtering
{ tenantId: 1, correlationKey: 1, status: 1 } // Correlation lookup
```
### 4. Existing Tenancy Infrastructure
The following was already in place:
- All models have `TenantId` property (NotifyRule, NotifyChannel, NotifyTemplate, etc.)
- Repository interfaces take `tenantId` as parameter
- Endpoints extract tenant from `X-StellaOps-Tenant` header
- MongoDB document IDs use tenant-prefixed composite keys
## Configuration
```json
{
"Notifier": {
"Tenant": {
"TenantIdHeader": "X-StellaOps-Tenant",
"ActorHeader": "X-StellaOps-Actor",
"RequireTenant": true,
"DefaultActor": "system",
"ExcludedPaths": ["/health", "/ready", "/metrics", "/openapi"]
}
}
}
```
## Usage Examples
### HTTP API
```http
GET /api/v2/rules HTTP/1.1
X-StellaOps-Tenant: tenant-123
X-StellaOps-Actor: user@example.com
```
### Worker Processing
```csharp
public class NotificationProcessor
{
private readonly ITenantContext _tenantContext;
public async Task ProcessAsync(NotifyEvent @event)
{
using var scope = _tenantContext.SetContext(@event.TenantId, "worker");
// All subsequent operations are scoped to tenant
var rules = await _rules.ListAsync(@event.TenantId);
// ...
}
}
```
## Handoff Notes
- Incident storage moved from in-memory to MongoDB with full tenant isolation
- Worker should use `ITenantContext.SetContext()` before processing events
- All new repositories MUST include tenant filtering in queries
- Test tenant isolation with multi-tenant integration tests
## Related Files
- `src/Notifier/StellaOps.Notifier.Worker/Tenancy/TenantContext.cs`
- `src/Notifier/StellaOps.Notifier.Worker/Tenancy/TenantServiceExtensions.cs`
- `src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/Repositories/INotifyIncidentRepository.cs`
- `src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/Repositories/NotifyIncidentRepository.cs`
- `src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/Serialization/NotifyIncidentDocumentMapper.cs`
- `src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/Options/NotifyMongoOptions.cs` (added IncidentsCollection)
- `src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/Migrations/EnsureNotifyIndexesMigration.cs` (added incident indexes)
- `src/Notify/__Libraries/StellaOps.Notify.Storage.Mongo/ServiceCollectionExtensions.cs` (added INotifyIncidentRepository registration)

View File

@@ -59,3 +59,97 @@
## Definition of done
- Notify service, workers, connectors, Console/CLI, observability, and Offline Kit assets shipped with documentation and runbooks.
- Compliance checklist appended to docs; ./TASKS.md and ../../TASKS.md updated with progress.
---
## Sprint readiness tracker
> Last updated: 2025-11-27 (NOTIFY-ENG-0001)
This section maps delivery phases to implementation sprints and tracks readiness checkpoints.
### Phase 1 — Core rules engine & delivery ledger
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| NOTIFY-SVC-37-001 | ✅ DONE (2025-11-24) | SPRINT_0172_0001_0002_notifier_ii | Pack approval contract published (OpenAPI schema, payloads). |
| NOTIFY-SVC-37-002 | ✅ DONE (2025-11-24) | SPRINT_0172_0001_0002_notifier_ii | Ingestion endpoint with Mongo persistence, idempotent writes, audit trail. |
| NOTIFY-SVC-37-003 | 🔄 DOING | SPRINT_0172_0001_0002_notifier_ii | Approval/policy templates, routing predicates; dispatch/rendering pending. |
| NOTIFY-SVC-37-004 | ✅ DONE (2025-11-24) | SPRINT_0172_0001_0002_notifier_ii | Acknowledgement API, test harness, metrics. |
| NOTIFY-OAS-61-001 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | OAS with rules/templates/incidents/quiet hours endpoints. |
| NOTIFY-OAS-61-002 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | `/.well-known/openapi` discovery endpoint. |
| NOTIFY-OAS-62-001 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | SDK examples for rule CRUD. |
| NOTIFY-OAS-63-001 | ✅ DONE (2025-11-17) | SPRINT_0171_0001_0001_notifier_i | Deprecation headers and templates. |
**Checkpoint:** Core rules engine mostly complete; template dispatch/rendering in progress.
### Phase 2 — Connectors & rendering
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| NOTIFY-SVC-38-002 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Channel adapters (email, chat webhook, generic webhook) with retry policies. |
| NOTIFY-SVC-38-003 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Template service, renderer with redaction and localization. |
| NOTIFY-SVC-38-004 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | REST + WS APIs for rules CRUD, templates preview, incidents. |
| NOTIFY-DOC-70-001 | ✅ DONE (2025-11-02) | SPRINT_0171_0001_0001_notifier_i | Architecture docs for `src/Notify` vs `src/Notifier` split. |
**Checkpoint:** Connector and rendering work not yet started; depends on Phase 1 completion.
### Phase 3 — Console & CLI authoring
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| NOTIFY-SVC-39-001 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Correlation engine with throttler, quiet hours, incident lifecycle. |
| NOTIFY-SVC-39-002 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Digest generator with schedule runner. |
| NOTIFY-SVC-39-003 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Simulation engine for dry-run rules against historical events. |
| NOTIFY-SVC-39-004 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Quiet hour calendars with audit logging. |
**Checkpoint:** Console/CLI authoring work not started; depends on Phase 2 completion.
### Phase 4 — Governance & observability
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| NOTIFY-SVC-40-001 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Escalations, on-call schedules, PagerDuty/OpsGenie adapters. |
| NOTIFY-SVC-40-002 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Summary storm breaker, localization bundles. |
| NOTIFY-SVC-40-003 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Security hardening (signed ack links, webhook HMAC). |
| NOTIFY-SVC-40-004 | 📝 TODO | SPRINT_0172_0001_0002_notifier_ii | Observability metrics/traces, dead-letter handling, chaos tests. |
| NOTIFY-OBS-51-001 | ✅ DONE (2025-11-22) | SPRINT_0171_0001_0001_notifier_i | SLO evaluator webhooks with templates/routing/suppression. |
| NOTIFY-OBS-55-001 | ✅ DONE (2025-11-22) | SPRINT_0171_0001_0001_notifier_i | Incident mode templates with evidence/trace/retention context. |
| NOTIFY-ATTEST-74-001 | ✅ DONE (2025-11-16) | SPRINT_0171_0001_0001_notifier_i | Templates for verification failures, key revocations, transparency. |
| NOTIFY-ATTEST-74-002 | 📝 TODO | SPRINT_0171_0001_0001_notifier_i | Wire notifications to key rotation/revocation events. |
| NOTIFY-RISK-66-001 | ⏳ BLOCKED | SPRINT_0171_0001_0001_notifier_i | Risk severity escalation triggers; needs POLICY-RISK-40-002. |
| NOTIFY-RISK-67-001 | ⏳ BLOCKED | SPRINT_0171_0001_0001_notifier_i | Risk profile publish/deprecate notifications. |
| NOTIFY-RISK-68-001 | ⏳ BLOCKED | SPRINT_0171_0001_0001_notifier_i | Per-profile routing, quiet hours, dedupe. |
**Checkpoint:** Core observability complete; governance and risk notifications blocked on upstream dependencies.
### Phase 5 — Offline & compliance
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| NOTIFY-AIRGAP-56-002 | ✅ DONE | SPRINT_0171_0001_0001_notifier_i | Bootstrap Pack with deterministic secrets and offline validation. |
| NOTIFY-TEN-48-001 | ⏳ BLOCKED | SPRINT_0173_0001_0003_notifier_iii | Tenant-scope rules/templates; needs Sprint 0172 tenancy model. |
**Checkpoint:** Offline basics complete; tenancy work blocked on upstream Sprint 0172.
---
### Overall readiness summary
| Phase | Status | Blocking items |
|-------|--------|----------------|
| **1 Core rules engine** | 🔄 In progress | NOTIFY-SVC-37-003 dispatch/rendering |
| **2 Connectors & rendering** | 📝 Not started | Phase 1 completion |
| **3 Console & CLI** | 📝 Not started | Phase 2 completion |
| **4 Governance & observability** | 🔄 Partial | POLICY-RISK-40-002 for risk notifications |
| **5 Offline & compliance** | 🔄 Partial | Sprint 0172 tenancy model |
### Cross-module dependencies
| Dependency | Required by | Status |
|------------|-------------|--------|
| Attestor payload localization | NOTIFY-ATTEST-74-002 | Freeze pending |
| POLICY-RISK-40-002 export | NOTIFY-RISK-66/67/68 | BLOCKED |
| Sprint 0172 tenancy model | NOTIFY-TEN-48-001 | In progress |
| Telemetry SLO webhook schema | NOTIFY-OBS-51-001 | ✅ Published (`docs/notifications/slo-webhook-schema.md`) |
### Next actions
1. Complete NOTIFY-SVC-37-003 dispatch/rendering wiring (Sprint 0172).
2. Start NOTIFY-SVC-38-002 channel adapters once Phase 1 closes.
3. Track POLICY-RISK-40-002 to unblock risk notification tasks.
4. Monitor Sprint 0172 tenancy model for NOTIFY-TEN-48-001.

View File

@@ -59,3 +59,78 @@
- Export Center + Attestor dependencies validated; CLI parity confirmed.
- Documentation updated (README, architecture, runbooks, CLI guides) with imposed rule compliance.
- ./TASKS.md and ../../TASKS.md reflect the latest status transitions.
---
## Sprint readiness tracker
> Last updated: 2025-11-27 (SIGNER-ENG-0001)
This section maps delivery phases to implementation sprints and tracks readiness checkpoints.
### Phase 1 — Core service & PoE
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| KMSI-73-001 | ✅ DONE (2025-11-03) | SPRINT_100_identity_signing | KMS key management foundations with staffing + DSSE contract. |
| KMSI-73-002 | ✅ DONE (2025-11-03) | SPRINT_100_identity_signing | FIDO2 profile integration. |
| PROV-OBS-53-001 | ✅ DONE (2025-11-17) | SPRINT_0513_0001_0001_provenance | DSSE/SLSA BuildDefinition + BuildMetadata models with canonical JSON serializer. |
| PROV-OBS-53-002 | ✅ DONE (2025-11-23) | SPRINT_0513_0001_0001_provenance | Signer abstraction (cosign/KMS/offline) with key rotation hooks and audit logging. |
| SEC-CRYPTO-90-020 | 🔄 IN PROGRESS | SPRINT_0514_0001_0001_sovereign_crypto | CryptoPro signer plugin; Windows CSP runner pending. |
**Checkpoint:** Core signing infrastructure operational — KMS drivers, signer abstractions, and DSSE models delivered.
### Phase 2 — Export Center integration
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| PROV-OBS-53-003 | ✅ DONE (2025-11-23) | SPRINT_0513_0001_0001_provenance | PromotionAttestationBuilder feeding canonicalised payloads to Signer. |
| SIGN-REPLAY-186-003 | 📝 TODO | SPRINT_186_record_deterministic_execution | Extend Signer/Authority DSSE flows for replay manifest/bundle payloads. |
| SIGN-CORE-186-004 | 📝 TODO | SPRINT_186_record_deterministic_execution | Replace HMAC demo with StellaOps.Cryptography providers (keyless + KMS). |
| SIGN-CORE-186-005 | 📝 TODO | SPRINT_186_record_deterministic_execution | Refactor SignerStatementBuilder for StellaOps predicate types. |
| SIGN-TEST-186-006 | 📝 TODO | SPRINT_186_record_deterministic_execution | Upgrade signer integration tests with real crypto + fixture predicates. |
**Checkpoint:** Export Center signing APIs partially complete; replay manifest support and crypto provider refactoring pending.
### Phase 3 — Attestor alignment
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| AUTH-REACH-401-005 | 📝 TODO | SPRINT_0401_0001_0001_reachability_evidence_chain | DSSE predicate types for SBOM/Graph/VEX/Replay; blocked on predicate definitions. |
| SIGN-VEX-401-018 | 📝 TODO | SPRINT_0401_0001_0001_reachability_evidence_chain | Extend predicate catalog with `stella.ops/vexDecision@v1`. |
| PROV-OBS-54-001 | 📝 TODO | SPRINT_0513_0001_0001_provenance | Verification library for DSSE signatures, Merkle roots, timeline chain. |
| PROV-OBS-54-002 | 📝 TODO | SPRINT_0513_0001_0001_provenance | .NET global tool for local verification + CLI `stella forensic verify`. |
**Checkpoint:** Attestor DSSE alignment pending; predicate catalog extension and verification library not started.
### Phase 4 — Observability & resilience
| Task ID | Status | Sprint | Notes |
|---------|--------|--------|-------|
| DOCS-PROMO-70-001 | 📝 TODO | SPRINT_304_docs_tasks_md_iv | Promotion attestations doc (CLI commands, Signer/Attestor integration, offline verification). |
| CLI-PROMO-70-002 | 📝 TODO | SPRINT_203_cli_iii | `stella promotion attest` / `promotion verify` commands. |
| CLI-FORENSICS-54-002 | 📝 TODO | SPRINT_202_cli_ii | `stella forensic attest show <artifact>` listing signer details. |
**Checkpoint:** Observability and CLI integration pending; waiting on upstream signing pipeline completion.
---
### Overall readiness summary
| Phase | Status | Blocking items |
|-------|--------|----------------|
| **1 Core service & PoE** | ✅ Complete | — |
| **2 Export Center integration** | 🔄 In progress | SIGN-CORE-186-004/005 crypto provider refactoring |
| **3 Attestor alignment** | 📝 Not started | AUTH-REACH-401-005 predicate definitions |
| **4 Observability & resilience** | 📝 Not started | Upstream phase completion |
### Cross-module dependencies
| Dependency | Required by | Status |
|------------|-------------|--------|
| Attestor DSSE bundle schema | SIGN-VEX-401-018 | Documented in `docs/modules/attestor/architecture.md` §1 |
| Provenance library canonicalisation | SIGN-CORE-186-005 | Available via PROV-OBS-53-001/002 |
| Export Center bundle manifest | SIGN-REPLAY-186-003 | Pending Sprint 162/163 deliverables |
| Authority predicate definitions | AUTH-REACH-401-005 | Schema draft pending |
### Next actions
1. Complete CryptoPro signer plugin Windows smoke test (SEC-CRYPTO-90-020, Sprint 0514).
2. Start SIGN-CORE-186-004 once replay bundle schema finalises (Sprint 186).
3. Track AUTH-REACH-401-005 predicate schema draft for Attestor alignment (Sprint 401).
4. Monitor PROV-OBS-54-001/002 for verification library availability.

View File

@@ -0,0 +1,259 @@
# Pack Approvals Notification Contract
> **Status:** Implemented (NOTIFY-SVC-37-001)
> **Last Updated:** 2025-11-27
> **OpenAPI Spec:** `src/Notifier/StellaOps.Notifier/StellaOps.Notifier.WebService/openapi/pack-approvals.yaml`
## Overview
This document defines the canonical contract for pack approval notifications between Task Runner and the Notifier service. It covers event payloads, resume token mechanics, error handling, and security requirements.
## Event Kinds
| Kind | Description | Trigger |
|------|-------------|---------|
| `pack.approval.requested` | Approval required for pack deployment | Task Runner initiates deployment requiring approval |
| `pack.approval.updated` | Approval state changed | Decision recorded or timeout |
| `pack.policy.hold` | Policy gate blocked deployment | Policy Engine rejects deployment |
| `pack.policy.released` | Policy hold lifted | Policy conditions satisfied |
## Canonical Event Schema
```json
{
"eventId": "550e8400-e29b-41d4-a716-446655440000",
"issuedAt": "2025-11-27T10:30:00Z",
"kind": "pack.approval.requested",
"packId": "pkg:oci/stellaops/scanner@v2.1.0",
"policy": {
"id": "policy-prod-deploy",
"version": "1.2.3"
},
"decision": "pending",
"actor": "ci-pipeline@stellaops.example.com",
"resumeToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9...",
"summary": "Deployment approval required for production scanner update",
"labels": {
"environment": "production",
"team": "security"
}
}
```
### Required Fields
| Field | Type | Description |
|-------|------|-------------|
| `eventId` | UUID | Unique event identifier; used for deduplication |
| `issuedAt` | ISO 8601 | Event timestamp in UTC |
| `kind` | string | Event type (see Event Kinds table) |
| `packId` | string | Package identifier in PURL format |
| `decision` | string | Current state: `pending`, `approved`, `rejected`, `hold`, `expired` |
| `actor` | string | Identity that triggered the event |
### Optional Fields
| Field | Type | Description |
|-------|------|-------------|
| `policy` | object | Policy metadata (`id`, `version`) |
| `resumeToken` | string | Opaque token for Task Runner resume flow |
| `summary` | string | Human-readable summary for notifications |
| `labels` | object | Custom key-value metadata |
## Resume Token Mechanics
### Token Flow
```
┌─────────────┐ POST /pack-approvals ┌──────────────┐
│ Task Runner │ ──────────────────────────────►│ Notifier │
│ │ { resumeToken: "abc123" } │ │
│ │◄──────────────────────────────│ │
│ │ X-Resume-After: "abc123" │ │
└─────────────┘ └──────────────┘
│ │
│ │
│ User acknowledges approval │
│ ▼
│ ┌──────────────────────────────┐
│ │ POST /pack-approvals/{id}/ack
│ │ { ackToken: "..." } │
│ └──────────────────────────────┘
│ │
│◄─────────────────────────────────────────────┤
│ Resume callback (webhook or message bus) │
```
### Token Properties
- **Format:** Opaque string; clients must not parse or modify
- **TTL:** 24 hours from `issuedAt`
- **Uniqueness:** Scoped to tenant + packId + eventId
- **Expiry Handling:** Expired tokens return `410 Gone`
### X-Resume-After Header
When `resumeToken` is present in the request, the server echoes it in the `X-Resume-After` response header. This enables cursor-based processing for Task Runner polling.
## Error Handling
### HTTP Status Codes
| Code | Meaning | Client Action |
|------|---------|---------------|
| `200` | Duplicate request (idempotent) | Treat as success |
| `202` | Accepted for processing | Continue normal flow |
| `204` | Acknowledgement recorded | Continue normal flow |
| `400` | Validation error | Fix request and retry |
| `401` | Authentication required | Refresh token and retry |
| `403` | Insufficient permissions | Check scope; contact admin |
| `404` | Resource not found | Verify packId; may have expired |
| `410` | Token expired | Re-initiate approval flow |
| `429` | Rate limited | Retry after `Retry-After` seconds |
| `5xx` | Server error | Retry with exponential backoff |
### Error Response Format
```json
{
"error": {
"code": "invalid_request",
"message": "eventId, packId, kind, decision, actor are required.",
"traceId": "00-abc123-def456-00"
}
}
```
### Retry Strategy
- **Transient errors (5xx, 429):** Exponential backoff starting at 1s, max 60s, max 5 retries
- **Validation errors (4xx except 429):** Do not retry; fix request
- **Idempotency:** Safe to retry any request with the same `Idempotency-Key`
## Security Requirements
### Authentication
All endpoints require a valid OAuth2 bearer token with one of these scopes:
- `packs.approve` — Full approval flow access
- `Notifier.Events:Write` — Event ingestion only
### Tenant Isolation
- `X-StellaOps-Tenant` header is **required** on all requests
- Server validates token tenant claim matches header
- Cross-tenant access returns `403 Forbidden`
### Idempotency
- `Idempotency-Key` header is **required** for POST endpoints
- Keys are scoped to tenant and expire after 15 minutes
- Duplicate requests within the window return `200 OK`
### HMAC Signature (Webhooks)
For webhook callbacks from Notifier to Task Runner:
```
X-StellaOps-Signature: sha256=<hex-encoded-signature>
X-StellaOps-Timestamp: <unix-timestamp>
```
Signature computed as:
```
HMAC-SHA256(secret, timestamp + "." + body)
```
Verification requirements:
- Reject if timestamp is >5 minutes old
- Reject if signature does not match
- Reject if body has been modified
### IP Allowlist
Configurable per environment in `notifier:security:ipAllowlist`:
```yaml
notifier:
security:
ipAllowlist:
- "10.0.0.0/8"
- "192.168.1.100"
```
### Sensitive Data Handling
- **Resume tokens:** Encrypted at rest; never logged in full
- **Ack tokens:** Signed with KMS; validated on acknowledgement
- **Labels:** Redacted if keys match `secret`, `password`, `token`, `key` patterns
## Audit Trail
All operations emit structured audit events:
| Event | Fields | Retention |
|-------|--------|-----------|
| `pack.approval.ingested` | packId, kind, decision, actor, eventId | 90 days |
| `pack.approval.acknowledged` | packId, ackToken, decision, actor | 90 days |
| `pack.policy.hold` | packId, policyId, reason | 90 days |
## Observability
### Metrics
| Metric | Type | Labels |
|--------|------|--------|
| `notifier_pack_approvals_total` | Counter | `kind`, `decision`, `tenant` |
| `notifier_pack_approvals_outstanding` | Gauge | `tenant` |
| `notifier_pack_approval_ack_latency_seconds` | Histogram | `decision` |
| `notifier_pack_approval_errors_total` | Counter | `code`, `tenant` |
### Structured Logs
All operations include:
- `traceId` — Distributed trace correlation
- `tenantId` — Tenant identifier
- `packId` — Package identifier
- `eventId` — Event identifier
## Integration Examples
### Task Runner → Notifier (Ingestion)
```bash
curl -X POST https://notifier.stellaops.example.com/api/v1/notify/pack-approvals \
-H "Authorization: Bearer $TOKEN" \
-H "X-StellaOps-Tenant: tenant-acme-corp" \
-H "Idempotency-Key: $(uuidgen)" \
-H "Content-Type: application/json" \
-d '{
"eventId": "550e8400-e29b-41d4-a716-446655440000",
"issuedAt": "2025-11-27T10:30:00Z",
"kind": "pack.approval.requested",
"packId": "pkg:oci/stellaops/scanner@v2.1.0",
"decision": "pending",
"actor": "ci-pipeline@stellaops.example.com",
"resumeToken": "abc123",
"summary": "Approval required for production deployment"
}'
```
### Console → Notifier (Acknowledgement)
```bash
curl -X POST https://notifier.stellaops.example.com/api/v1/notify/pack-approvals/pkg%3Aoci%2Fstellaops%2Fscanner%40v2.1.0/ack \
-H "Authorization: Bearer $TOKEN" \
-H "X-StellaOps-Tenant: tenant-acme-corp" \
-H "Content-Type: application/json" \
-d '{
"ackToken": "ack-token-xyz789",
"decision": "approved",
"comment": "Reviewed and approved"
}'
```
## Related Documents
- [Pack Approvals Integration Requirements](pack-approvals-integration.md)
- [Notifications Architecture](architecture.md)
- [Notifications API Reference](api.md)
- [Notification Templates](templates.md)

View File

@@ -0,0 +1,684 @@
Heres a practical, firsttimefriendly guide to using VEX in StellaOps, plus a concrete .NET pattern you can drop in today.
---
# VEX in a nutshell
* **VEX (Vulnerability Exploitability eXchange)**: a small JSON document that says whether specific CVEs *actually* affect a product/version.
* **OpenVEX**: SBOMagnostic; references products/components directly (URIs, PURLs, hashes). Great for canonical internal models.
* **CycloneDX VEX / SPDX VEX**: tie VEX statements closely to a specific SBOM instance (component BOM ref IDs). Great when the BOM is your source of truth.
**Our strategy:**
* **Store VEX separately** from SBOMs (deterministic, easier airgap bundling).
* **Link by strong references** (PURLs + content hashes + optional SBOM component IDs).
* **Translate on ingest** between OpenVEX ↔ CycloneDX VEX as needed so downstream tools stay happy.
---
# Translation model (OpenVEX ↔ CycloneDX VEX)
1. **Identity mapping**
* Prefer **PURL** for packages; fallback to **SHA256 (or SHA512)** of artifact; optionally include **SBOM `bom-ref`** if known.
2. **Product scope**
* OpenVEX “product” → CycloneDX `affects` with `bom-ref` (if available) or a synthetic ref derived from PURL/hash.
3. **Status mapping**
* `affected | not_affected | under_investigation | fixed` map 1:1.
* Keep **timestamps**, **justification**, **impact statement**, and **origin**.
4. **Evidence**
* Preserve links to advisories, commits, tests; attach as CycloneDX `analysis/evidence` notes (or OpenVEX `metadata/notes`).
**Collision rules (deterministic):**
* New statement wins if:
* Newer `timestamp` **and**
* Higher **provenance trust** (signed by vendor/Authority) or equal with a lexicographic tiebreak (issuer keyID).
---
# Storage model (MongoDBfriendly)
* **Collections**
* `vex.documents` one doc per VEX file (OpenVEX or CycloneDX VEX).
* `vex.statements` *flattened*, one per (product/component, vuln).
* `artifacts` canonical component index (PURL, hashes, optional SBOM refs).
* **Reference keys**
* `artifactKey = purl || sha256 || (groupId:name:version for .NET/NuGet)`
* `vulnKey = cveId || ghsaId || internalId`
* **Deterministic IDs**
* `_id = sha256(canonicalize(statement-json-without-signature))`
* **Signatures**
* Keep DSSE/Sigstore envelopes in `vex.documents.signatures[]` for audit & replay.
---
# Airgap bundling
Package **SBOMs + VEX + artifacts index + trust roots** as a single tarball:
```
/bundle/
sboms/*.json
vex/*.json # OpenVEX & CycloneDX VEX allowed
index/artifacts.jsonl # purl, hashes, bom-ref map
trust/rekor.merkle.roots
trust/fulcio.certs.pem
trust/keys/*.pub
manifest.json # content list + sha256 + issuedAt
```
* **Deterministic replay:** reingest is pure function of bundle bytes → identical DB state.
---
# .NET 10 implementation (C#) deterministic ingestion
### Core models
```csharp
public record ArtifactRef(
string? Purl,
string? Sha256,
string? BomRef);
public enum VexStatus { Affected, NotAffected, UnderInvestigation, Fixed }
public record VexStatement(
string StatementId, // sha256 of canonical payload
ArtifactRef Artifact,
string VulnId, // e.g., "CVE-2024-1234"
VexStatus Status,
string? Justification,
string? ImpactStatement,
DateTimeOffset Timestamp,
string IssuerKeyId, // from DSSE/Signing
int ProvenanceScore); // Authority policy
```
### Canonicalizer (stable order, no env fields)
```csharp
static string Canonicalize(VexStatement s)
{
var payload = new {
artifact = new { s.Artifact.Purl, s.Artifact.Sha256, s.Artifact.BomRef },
vulnId = s.VulnId,
status = s.Status.ToString(),
justification = s.Justification,
impact = s.ImpactStatement,
timestamp = s.Timestamp.UtcDateTime
};
// Use System.Text.Json with deterministic ordering
var opts = new System.Text.Json.JsonSerializerOptions {
WriteIndented = false
};
string json = System.Text.Json.JsonSerializer.Serialize(payload, opts);
// Normalize unicode + newline
json = json.Normalize(NormalizationForm.FormKC).Replace("\r\n","\n");
return json;
}
static string Sha256(string s)
{
using var sha = System.Security.Cryptography.SHA256.Create();
var bytes = sha.ComputeHash(System.Text.Encoding.UTF8.GetBytes(s));
return Convert.ToHexString(bytes).ToLowerInvariant();
}
```
### Ingest pipeline
```csharp
public sealed class VexIngestor
{
readonly IVexParser _parser; // OpenVEX & CycloneDX adapters
readonly IArtifactIndex _artifacts;
readonly IVexRepo _repo; // Mongo-backed
readonly IPolicy _policy; // tie-break rules
public async Task IngestAsync(Stream vexJson, SignatureEnvelope? sig)
{
var doc = await _parser.ParseAsync(vexJson); // yields normalized statements
var issuer = sig?.KeyId ?? "unknown";
foreach (var st in doc.Statements)
{
var canon = Canonicalize(st);
var id = Sha256(canon);
var withMeta = st with {
StatementId = id,
IssuerKeyId = issuer,
ProvenanceScore = _policy.Score(sig, st)
};
// Upsert artifact (purl/hash/bomRef)
await _artifacts.UpsertAsync(withMeta.Artifact);
// Deterministic merge
var existing = await _repo.GetAsync(id)
?? await _repo.FindByKeysAsync(withMeta.Artifact, st.VulnId);
if (existing is null || _policy.IsNewerAndStronger(existing, withMeta))
await _repo.UpsertAsync(withMeta);
}
if (sig is not null) await _repo.AttachSignatureAsync(doc.DocumentId, sig);
}
}
```
### Parsers (adapters)
* `OpenVexParser` reads OpenVEX; emits `VexStatement` with `ArtifactRef(PURL/hash)`
* `CycloneDxVexParser` resolves `bom-ref` → look up PURL/hash via `IArtifactIndex` (if SBOM present); if not, store bomref and mark artifact unresolved for later backfill.
---
# Why this works for StellaOps
* **SBOMagnostic core** (OpenVEXfirst) maps cleanly to your MongoDB canonical stores and `.NET 10` services.
* **SBOMaware edges** (CycloneDX VEX) are still supported via adapters and `bom-ref` backfill.
* **Deterministic everything**: canonical JSON → SHA256 IDs → reproducible merges → perfect for audits and offline environments.
* **Airgap ready**: single bundle with trust roots, replayable on any node.
---
# Next steps (plugandplay)
1. Implement the two parsers (`OpenVexParser`, `CycloneDxVexParser`).
2. Add the repo/index interfaces to your `StellaOps.Vexer` service:
* `IVexRepo` (Mongo collections `vex.documents`, `vex.statements`)
* `IArtifactIndex` (your canonical PURL/hash map)
3. Wire `Policy` to Authority to score signatures and apply tiebreaks.
4. Add a `bundle ingest` CLI: `vexer ingest /bundle/manifest.json`.
5. Expose GraphQL (HotChocolate) queries:
* `vexStatements(artifactKey, vulnId)`, `vexStatus(artifactKey)`, `evidence(...)`.
If you want, I can generate the exact Mongo schemas, HotChocolate types, and a minimal test bundle to validate the ingest endtoend.
Below is a complete, developer-ready implementation plan for the **VEX ingestion, translation, canonicalization, storage, and merge-policy pipeline** inside **Stella Ops.Vexer**, aligned with your architecture, deterministic requirements, MongoDB model, DSSE/Authority workflow, and `.NET 10` standards.
This is structured so an average developer can follow it step-by-step without ambiguity.
It is broken into phases, each with clear tasks, acceptance criteria, failure modes, interfaces, and code pointers.
---
# Stella Ops.Vexer
## Full Implementation Plan (Developer-Executable)
---
# 1. Core Objectives
Develop a deterministic, replayable, SBOM-agnostic but SBOM-compatible VEX subsystem supporting:
* OpenVEX and CycloneDX VEX ingestion.
* Canonicalization → SHA-256 identity.
* Cross-linking to artifacts (purl, hash, bom-ref).
* Merge policies driven by Authority trust/lattice rules.
* Complete offline reproducibility.
* MongoDB canonical storage.
* Exposed through gRPC/REST/GraphQL.
---
# 2. Module Structure (to be implemented)
```
src/StellaOps.Vexer/
Application/
Commands/
Queries/
Ingest/
Translation/
Merge/
Policies/
Domain/
Entities/
ValueObjects/
Services/
Infrastructure/
Mongo/
AuthorityClient/
Hashing/
Signature/
BlobStore/
Presentation/
GraphQL/
REST/
gRPC/
```
Every subfolder must compile in strict mode (treat warnings as errors).
---
# 3. Data Model (MongoDB)
## 3.1 `vex.statements` collection
Document schema:
```json
{
"_id": "sha256(canonical-json)",
"artifact": {
"purl": "pkg:nuget/... or null",
"sha256": "hex or null",
"bomRef": "optional ref",
"resolved": true | false
},
"vulnId": "CVE-XXXX-YYYY",
"status": "affected | not_affected | under_investigation | fixed",
"justification": "...",
"impact": "...",
"timestamp": "2024-01-01T12:34:56Z",
"issuerKeyId": "FULCIO-KEY-ID",
"provenanceScore": 0100,
"documentId": "UUID of vex.documents entry",
"sourceFormat": "openvex|cyclonedx",
"createdAt": "...",
"updatedAt": "..."
}
```
## 3.2 `vex.documents` collection
```
{
"_id": "<uuid>",
"format": "openvex|cyclonedx",
"rawBlobId": "<blob-id in blobstore>",
"signatures": [
{
"type": "dsse",
"verified": true,
"issuerKeyId": "F-123...",
"timestamp": "...",
"bundleEvidence": {...}
}
],
"ingestedAt": "...",
"statementIds": ["sha256-1", "sha256-2", ...]
}
```
---
# 4. Components to Implement
## 4.1 Parsing Layer
### Interfaces
```csharp
public interface IVexParser
{
ValueTask<ParsedVexDocument> ParseAsync(Stream jsonStream);
}
public sealed record ParsedVexDocument(
string DocumentId,
string Format,
IReadOnlyList<ParsedVexStatement> Statements);
```
### Tasks
1. Implement `OpenVexParser`.
* Use System.Text.Json source generators.
* Validate OpenVEX schema version.
* Extract product → component mapping.
* Map to internal `ArtifactRef`.
2. Implement `CycloneDxVexParser`.
* Support 1.5+ “vex” extension.
* bom-ref resolution through `IArtifactIndex`.
* Mark unresolved `bom-ref` but store them.
### Acceptance Criteria
* Both parsers produce identical internal representation of statements.
* Unknown fields must not corrupt canonicalization.
* 100% deterministic mapping for same input.
---
## 4.2 Canonicalizer
Implement deterministic ordering, UTF-8 normalization, stable JSON.
### Tasks
1. Create `Canonicalizer` class.
2. Apply:
* Property order: artifact, vulnId, status, justification, impact, timestamp.
* Remove optional metadata (issuerKeyId, provenance).
* Normalize Unicode → NFKC.
* Replace CRLF → LF.
3. Generate SHA-256.
### Interface
```csharp
public interface IVexCanonicalizer
{
string Canonicalize(VexStatement s);
string ComputeId(string canonicalJson);
}
```
### Acceptance Criteria
* Hash identical on all OS, time, locale, machines.
* Replaying the same bundle yields same `_id`.
---
## 4.3 Authority / Signature Verification
### Tasks
1. Implement DSSE envelope reader.
2. Integrate Authority client:
* Verify certificate chain (Fulcio/GOST/eIDAS etc).
* Obtain trust lattice score.
* Produce `ProvenanceScore`: int.
### Interface
```csharp
public interface ISignatureVerifier
{
ValueTask<SignatureVerificationResult> VerifyAsync(Stream payload, Stream envelope);
}
```
### Acceptance Criteria
* If verification fails → Vexer stores document but flags signature invalid.
* Scores map to priority in merge policy.
---
## 4.4 Merge Policies
### Implement Default Policy
1. Newer timestamp wins.
2. If timestamps equal:
* Higher provenance score wins.
* If both equal, lexicographically smaller issuerKeyId wins.
### Interface
```csharp
public interface IVexMergePolicy
{
bool ShouldReplace(VexStatement existing, VexStatement incoming);
}
```
### Acceptance Criteria
* Merge decisions reproducible.
* Deterministic ordering even when values equal.
---
## 4.5 Ingestion Pipeline
### Steps
1. Accept `multipart/form-data` or referenced blob ID.
2. Parse via correct parser.
3. Verify signature (optional).
4. For each statement:
* Canonicalize.
* Compute `_id`.
* Upsert artifact into `artifacts` (via `IArtifactIndex`).
* Resolve bom-ref (if CycloneDX).
* Existing statement? Apply merge policy.
* Insert or update.
5. Create `vex.documents` entry.
### Class
`VexIngestService`
### Required Methods
```csharp
public Task<IngestResult> IngestAsync(VexIngestRequest request);
```
### Acceptance Tests
* Idempotent: ingesting same VEX repeated → DB unchanged.
* Deterministic under concurrency.
* Air-gap replay produces identical DB state.
---
## 4.6 Translation Layer
### Implement two converters:
* `OpenVexToCycloneDxTranslator`
* `CycloneDxToOpenVexTranslator`
### Rules
* Prefer PURL → hash → synthetic bom-ref.
* Single VEX statement → one CycloneDX “analysis” entry.
* Preserve justification, impact, notes.
### Acceptance Criteria
* Round-trip OpenVEX → CycloneDX → OpenVEX produces equal canonical hashes (except format markers).
---
## 4.7 Artifact Index Backfill
### Reason
CycloneDX VEX may refer to bom-refs not yet known at ingestion.
### Tasks
1. Store unresolved artifacts.
2. Create background `BackfillWorker`:
* Watches `sboms.documents` ingestion events.
* Matches bom-refs.
* Updates statements with resolved PURL/hashes.
* Recomputes canonical JSON + SHA-256 (new version stored as new ID).
3. Marks old unresolved statement as superseded.
### Acceptance Criteria
* Backfilling is monotonic: no overwriting original.
* Deterministic after backfill: same SBOM yields same final ID.
---
## 4.8 Bundle Ingestion (Air-Gap Mode)
### Structure
```
bundle/
sboms/*.json
vex/*.json
index/artifacts.jsonl
trust/*
manifest.json
```
### Tasks
1. Implement `BundleIngestService`.
2. Stages:
* Validate manifest + hashes.
* Import trust roots (local only).
* Ingest SBOMs first.
* Ingest VEX documents.
3. Reproduce same IDs on all nodes.
### Acceptance Criteria
* Byte-identical bundle → byte-identical DB.
* Works offline completely.
---
# 5. Interfaces for GraphQL/REST/gRPC
Expose:
## Queries
* `vexStatement(id)`
* `vexStatementsByArtifact(purl/hash)`
* `vexStatus(purl)` → latest merged status
* `vexDocument(id)`
* `affectedComponents(vulnId)`
## Mutations
* `ingestVexDocument`
* `translateVex(format)`
* `exportVexDocument(id, targetFormat)`
* `replayBundle(bundleId)`
All responses must include deterministic IDs.
---
# 6. Detailed Developer Tasks by Sprint
## Sprint 1: Foundation
1. Create solution structure.
2. Add Mongo DB contexts.
3. Implement data entities.
4. Implement hashing + canonicalizer.
5. Implement IVexParser interface.
## Sprint 2: Parsers
1. Implement OpenVexParser.
2. Implement CycloneDxParser.
3. Develop strong unit tests for JSON normalization.
## Sprint 3: Signature & Authority
1. DSSE envelope reader.
2. Call Authority to verify signatures.
3. Produce provenance scores.
## Sprint 4: Merge Policy Engine
1. Implement deterministic lattice merge.
2. Unit tests: 20+ collision scenarios.
## Sprint 5: Ingestion Pipeline
1. Implement ingest service end-to-end.
2. Insert/update logic.
3. Add GraphQL endpoints.
## Sprint 6: Translation Layer
1. OpenVEX↔CycloneDX converter.
2. Tests for round-trip.
## Sprint 7: Backfill System
1. Bom-ref resolver worker.
2. Rehashing logic for updated artifacts.
3. Events linking SBOM ingestion to backfill.
## Sprint 8: Air-Gap Bundle
1. BundleIngestService.
2. Manifest verification.
3. Trust root local loading.
## Sprint 9: Hardening
1. Fuzz parsers.
2. Deterministic stress tests.
3. Concurrency validation.
4. Storage compaction.
---
# 7. Failure Handling Matrix
| Failure | Action | Logged? | Retries |
| ------------------- | -------------------------------------- | ------- | ------- |
| Invalid JSON | Reject document | Yes | 0 |
| Invalid schema | Reject | Yes | 0 |
| Signature invalid | Store document, mark signature invalid | Yes | 0 |
| Artifact unresolved | Store unresolved, enqueue backfill | Yes | 3 |
| Merge conflict | Apply policy | Yes | 0 |
| Canonical mismatch | Hard fail | Yes | 0 |
---
# 8. Developer Unit Test Checklist
### must have tests for:
* Canonicalization stability (100 samples).
* Identical input twice → identical `_id`.
* Parsing OpenVEX with multi-product definitions.
* Parsing CycloneDX with missing bom-refs.
* Merge policy tie-breakers.
* Air-gap replay reproducibility.
* Translation equivalence.
---
# 9. Deliverables for Developers
They must produce:
1. Interfaces + DTOs + document schemas.
2. Canonicalizer with 100% deterministic output.
3. Two production-grade parsers.
4. Signature verification pipeline.
5. Merge policies aligned with Authority trust model.
6. End-to-end ingestion service.
7. Translation layer.
8. Backfill worker.
9. Air-gap bundle script + service.
10. GraphQL APIs.
---
If you want, I can next produce:
* A full **developer handbook** (6090 pages).
* Full **technical architecture ADRs**.
* A concrete **scaffold** with compiles-clean `.NET 10` project.
* Complete **test suite specification**.
* A **README.md** for new joiners.

View File

@@ -0,0 +1,944 @@
Heres a clean, actionready blueprint for a **public reachability benchmark** you can stand up quickly and grow over time.
# Why this matters (quick)
“Reachability” asks: *is a flagged vulnerability actually executable from real entry points in this codebase/container?* A public, reproducible benchmark lets you compare tools applestoapples, drive research, and keep vendors honest.
# What to collect (dataset design)
* **Projects & languages**
* Polyglot mix: **C/C++ (ELF/PE/MachO)**, **Java/Kotlin**, **C#/.NET**, **Python**, **JavaScript/TypeScript**, **PHP**, **Go**, **Rust**.
* For each project: small (≤5k LOC), medium (5100k), large (100k+).
* **Groundtruth artifacts**
* **Seed CVEs** with known sinks (e.g., deserializers, command exec, SS RF) and **neutral projects** with *no* reachable path (negatives).
* **Exploit oracles**: minimal PoCs or unit tests that (1) reach the sink and (2) toggle reachability via feature flags.
* **Build outputs (deterministic)**
* **Reproducible binaries/bytecode** (strip timestamps; fixed seeds; SOURCE_DATE_EPOCH).
* **SBOM** (CycloneDX/SPDX) + **PURLs** + **BuildID** (ELF .note.gnu.buildid / PE Authentihash / MachO UUID).
* **Attestations**: intoto/DSSE envelopes recording toolchain versions, flags, hashes.
* **Execution traces (for truth)**
* **CI traces**: callgraph dumps from compilers/analyzers; unittest coverage; optional **dynamic traces** (eBPF/.NET ETW/Java Flight Recorder).
* **Entrypoint manifests**: HTTP routes, CLI commands, cron/queue consumers.
* **Metadata**
* Language, framework, package manager, compiler versions, OS/container image, optimization level, stripping info, license.
# How to label ground truth
* **Pervuln case**: `(component, version, sink_id)` with label **reachable / unreachable / unknown**.
* **Evidence bundle**: pointer to (a) static call path, (b) dynamic hit (trace/coverage), or (c) rationale for negative.
* **Confidence**: high (static+dynamic agree), medium (one source), low (heuristic only).
# Scoring (simple + fair)
* **Binary classification** on cases:
* Precision, Recall, F1. Report **AUPR** if you output probabilities.
* **Path quality**
* **Explainability score (03)**:
* 0: “vuln reachable” w/o context
* 1: names only (entry→…→sink)
* 2: full interprocedural path w/ locations
* 3: plus **inputs/guards** (taint/constraints, env flags)
* **Runtime cost**
* Wallclock, peak RAM, image size; normalized by KLOC.
* **Determinism**
* Rerun variance (≤1% is “A”, 15% “B”, >5% “C”).
# Avoiding overfitting
* **Train/Dev/Test** splits per language; **hidden test** projects rotated quarterly.
* **Case churn**: introduce **isomorphic variants** (rename symbols, reorder files) to punish memorization.
* **Poisoned controls**: include decoy sinks and unreachable deadcode traps.
* **Submission rules**: require **attestations** of tool versions & flags; limit percase hints.
# Reference baselines (to run outofthebox)
* **Snyk Code/Reachability** (JS/Java/Python, SaaS/CLI).
* **Semgrep + Pro Engine** (rules + reachability mode).
* **CodeQL** (multilang, LGTMstyle queries).
* **Joern** (C/C++/JVM code property graphs).
* **angr** (binary symbolic exec; selective for native samples).
* **Languagespecific**: pipaudit w/ import graphs, npm with locktree + route discovery, Maven + callgraph (Soot/WALA).
# Submission format (one JSON per tool run)
```json
{
"tool": {"name": "YourTool", "version": "1.2.3"},
"run": {
"commit": "…",
"platform": "ubuntu:24.04",
"time_s": 182.4, "peak_mb": 3072
},
"cases": [
{
"id": "php-shop:fastjson@1.2.68:Sink#deserialize",
"prediction": "reachable",
"confidence": 0.88,
"explain": {
"entry": "POST /api/orders",
"path": [
"OrdersController::create",
"Serializer::deserialize",
"Fastjson::parseObject"
],
"guards": ["feature.flag.json_enabled==true"]
}
}
],
"artifacts": {
"sbom": "sha256:…", "attestation": "sha256:…"
}
}
```
# Folder layout (repo)
```
/benchmark
/cases/<lang>/<project>/<case_id>/
case.yaml # component@version, sink, labels, evidence refs
entrypoints.yaml # routes/CLIs/cron
build/ # Dockerfiles, lockfiles, pinned toolchains
outputs/ # SBOMs, binaries, traces (checksummed)
/splits/{train,dev,test}.txt
/schemas/{case.json,submission.json}
/scripts/{build.sh, run_tests.sh, score.py}
/docs/ (how-to, FAQs, T&Cs)
```
# Minimal **v1** (46 weeks of work)
1. **Languages**: JS/TS, Python, Java, C (ELF).
2. **2030 cases**: mix of reachable/unreachable with PoC unit tests.
3. **Deterministic builds** in containers; publish SBOM+attestations.
4. **Scorer**: precision/recall/F1 + explainability, runtime, determinism.
5. **Baselines**: run CodeQL + Semgrep across all; Snyk where feasible; angr for 3 native cases.
6. **Website**: static leaderboard (perlang, persize), download links, submission guide.
# V2+ (quarterly)
* Add **.NET, PHP, Go, Rust**; broaden binary focus (PE/MachO).
* Add **dynamic traces** (eBPF/ETW/JFR) and **taint oracles**.
* Introduce **configgated reachability** (feature flags, env, k8s secrets).
* Add **dataset cards** per case (threat model, CWE, falsepositive traps).
# Publishing & governance
* License: **CCBYSA** for metadata, **sourcecompatible OSS** for code, binaries under original licenses.
* **Repro packs**: `benchmark-kit.tgz` with container recipes, hashes, and attestations.
* **Disclosure**: CVE hygiene, responsible use, optout path for upstreams.
* **Stewards**: small TAC (you + two external reviewers) to approve new cases and adjudicate disputes.
# Immediate next steps (checklist)
* Lock the **schemas** (case + submission + attestation fields).
* Pick 8 seed projects (2 per language tiered by size).
* Draft 12 sinkcases (6 reachable, 6 unreachable) with unittest oracles.
* Script deterministic builds and **hashlocked SBOMs**.
* Implement the scorer; publish a **starter leaderboard** with 2 baselines.
* Ship **v1 website/docs** and open submissions.
If you want, I can generate the repo scaffold (folders, YAML/JSON schemas, Dockerfiles, scorer script) so your team can `git clone` and start adding cases immediately.
Cool, lets turn the blueprint into a concrete, developerfriendly implementation plan.
Ill assume **v1 scope** is:
* Languages: **JavaScript/TypeScript (Node)**, **Python**, **Java**, **C (ELF)**
* ~**2030 cases** total (reachable/unreachable mix)
* Baselines: **CodeQL**, **Semgrep**, maybe **Snyk** where licenses allow, and **angr** for a few native cases
You can expand later, but this plan is enough to get v1 shipped.
---
## 0. Overall project structure & ownership
**Owners**
* **Tech Lead** owns architecture & final decisions
* **Benchmark Core** 23 devs building schemas, scorer, infra
* **Language Tracks** 1 dev per language (JS, Python, Java, C)
* **Website/Docs** 1 dev
**Repo layout (target)**
```text
reachability-benchmark/
README.md
LICENSE
CONTRIBUTING.md
CODE_OF_CONDUCT.md
benchmark/
cases/
js/
express-blog/
case-001/
case.yaml
entrypoints.yaml
build/
Dockerfile
build.sh
src/ # project source (or submodule)
tests/ # unit tests as oracles
outputs/
sbom.cdx.json
binary.tar.gz
coverage.json
traces/ # optional dynamic traces
py/
flask-api/...
java/
spring-app/...
c/
httpd-like/...
schemas/
case.schema.yaml
entrypoints.schema.yaml
truth.schema.yaml
submission.schema.json
tools/
scorer/
rb_score/
__init__.py
cli.py
metrics.py
loader.py
explainability.py
pyproject.toml
tests/
build/
build_all.py
validate_builds.py
baselines/
codeql/
run_case.sh
config/
semgrep/
run_case.sh
rules/
snyk/
run_case.sh
angr/
run_case.sh
ci/
github/
benchmark.yml
website/
# static site / leaderboard
```
---
## 1. Phase 1 Repo & infra setup
### Task 1.1 Create repository
**Developer:** Tech Lead
**Deliverables:**
* Repo created (`reachability-benchmark` or similar)
* `LICENSE` (e.g., Apache-2.0 or MIT)
* Basic `README.md` describing:
* Purpose (public reachability benchmark)
* Highlevel design
* v1 scope (langs, #cases)
### Task 1.2 Bootstrap structure
**Developer:** Benchmark Core
Create directory skeleton as above (without filling everything yet).
Add:
```bash
# benchmark/Makefile
.PHONY: test lint build
test:
\tpytest benchmark/tools/scorer/tests
lint:
\tblack benchmark/tools/scorer
\tflake8 benchmark/tools/scorer
build:
\tpython benchmark/tools/build/build_all.py
```
### Task 1.3 Coding standards & tooling
**Developer:** Benchmark Core
* Add `.editorconfig`, `.gitignore`, and Python tool configs (`ruff`, `black`, or `flake8`).
* Define minimal **PR checklist** in `CONTRIBUTING.md`:
* Tests pass
* Lint passes
* New schemas have JSON schema or YAML schema and tests
* New cases come with oracles (tests/coverage)
---
## 2. Phase 2 Case & submission schemas
### Task 2.1 Define case metadata format
**Developer:** Benchmark Core
Create `benchmark/schemas/case.schema.yaml` and an example `case.yaml`.
**Example `case.yaml`**
```yaml
id: "js-express-blog:001"
language: "javascript"
framework: "express"
size: "small" # small | medium | large
component:
name: "express-blog"
version: "1.0.0-bench"
vulnerability:
cve: "CVE-XXXX-YYYY"
cwe: "CWE-502"
description: "Unsafe deserialization via user-controlled JSON."
sink_id: "Deserializer::parse"
ground_truth:
label: "reachable" # reachable | unreachable | unknown
confidence: "high" # high | medium | low
evidence_files:
- "truth.yaml"
notes: >
Unit test test_reachable_deserialization triggers the sink.
build:
dockerfile: "build/Dockerfile"
build_script: "build/build.sh"
output:
artifact_path: "outputs/binary.tar.gz"
sbom_path: "outputs/sbom.cdx.json"
coverage_path: "outputs/coverage.json"
traces_dir: "outputs/traces"
environment:
os_image: "ubuntu:24.04"
compiler: null
runtime:
node: "20.11.0"
source_date_epoch: 1730000000
```
**Acceptance criteria**
* Schema validates sample `case.yaml` with a Python script:
* `benchmark/tools/build/validate_schema.py` using `jsonschema` or `pykwalify`.
---
### Task 2.2 Entry points schema
**Developer:** Benchmark Core
`benchmark/schemas/entrypoints.schema.yaml`
**Example `entrypoints.yaml`**
```yaml
entries:
http:
- id: "POST /api/posts"
route: "/api/posts"
method: "POST"
handler: "PostsController.create"
cli:
- id: "generate-report"
command: "node cli.js generate-report"
description: "Generates summary report."
scheduled:
- id: "daily-cleanup"
schedule: "0 3 * * *"
handler: "CleanupJob.run"
```
---
### Task 2.3 Ground truth / truth schema
**Developer:** Benchmark Core + Language Tracks
`benchmark/schemas/truth.schema.yaml`
**Example `truth.yaml`**
```yaml
id: "js-express-blog:001"
cases:
- sink_id: "Deserializer::parse"
label: "reachable"
dynamic_evidence:
covered_by_tests:
- "tests/test_reachable_deserialization.js::should_reach_sink"
coverage_files:
- "outputs/coverage.json"
static_evidence:
call_path:
- "POST /api/posts"
- "PostsController.create"
- "PostsService.createFromJson"
- "Deserializer.parse"
config_conditions:
- "process.env.FEATURE_JSON_ENABLED == 'true'"
notes: "If FEATURE_JSON_ENABLED=false, path is unreachable."
```
---
### Task 2.4 Submission schema
**Developer:** Benchmark Core
`benchmark/schemas/submission.schema.json`
**Shape**
```json
{
"tool": { "name": "YourTool", "version": "1.2.3" },
"run": {
"commit": "abcd1234",
"platform": "ubuntu:24.04",
"time_s": 182.4,
"peak_mb": 3072
},
"cases": [
{
"id": "js-express-blog:001",
"prediction": "reachable",
"confidence": 0.88,
"explain": {
"entry": "POST /api/posts",
"path": [
"PostsController.create",
"PostsService.createFromJson",
"Deserializer.parse"
],
"guards": [
"process.env.FEATURE_JSON_ENABLED === 'true'"
]
}
}
],
"artifacts": {
"sbom": "sha256:...",
"attestation": "sha256:..."
}
}
```
Write Python validation utility:
```bash
python benchmark/tools/scorer/validate_submission.py submission.json
```
**Acceptance criteria**
* Validation fails on missing fields / wrong enum values.
* At least two sample submissions pass validation (e.g., “perfect” and “random baseline”).
---
## 3. Phase 3 Reference projects & deterministic builds
### Task 3.1 Select and vendor v1 projects
**Developer:** Tech Lead + Language Tracks
For each language, choose:
* 1 small toy app (simple web or CLI)
* 1 medium app (more routes, multiple modules)
* Optional: 1 large (for performance stress tests)
Add them under `benchmark/cases/<lang>/<project>/src/`
(or as git submodules if you want to track upstream).
---
### Task 3.2 Deterministic Docker build per project
**Developer:** Language Tracks
For each project:
* Create `build/Dockerfile`
* Create `build/build.sh` that:
* Builds the app
* Produces artifacts
* Generates SBOM and attestation
**Example `build/Dockerfile` (Node)**
```dockerfile
FROM node:20.11-slim
ENV NODE_ENV=production
ENV SOURCE_DATE_EPOCH=1730000000
WORKDIR /app
COPY src/ /app
COPY package.json package-lock.json /app/
RUN npm ci --ignore-scripts && \
npm run build || true
CMD ["node", "server.js"]
```
**Example `build.sh`**
```bash
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(dirname "$(readlink -f "$0")")/.."
OUT_DIR="$ROOT_DIR/outputs"
mkdir -p "$OUT_DIR"
IMAGE_TAG="rb-js-express-blog:1"
docker build -t "$IMAGE_TAG" "$ROOT_DIR/build"
# Export image as tarball (binary artifact)
docker save "$IMAGE_TAG" | gzip > "$OUT_DIR/binary.tar.gz"
# Generate SBOM (e.g. via syft) can be optional stub for v1
syft packages "docker:$IMAGE_TAG" -o cyclonedx-json > "$OUT_DIR/sbom.cdx.json"
# In future: generate in-toto attestations
```
---
### Task 3.3 Determinism checker
**Developer:** Benchmark Core
`benchmark/tools/build/validate_builds.py`:
* For each case:
* Run `build.sh` twice
* Compare hashes of `outputs/binary.tar.gz` and `outputs/sbom.cdx.json`
* Fail if hashes differ.
**Acceptance criteria**
* All v1 cases produce identical artifacts across two builds on CI.
---
## 4. Phase 4 Ground truth oracles (tests & traces)
### Task 4.1 Add unit/integration tests for reachable cases
**Developer:** Language Tracks
For each **reachable** case:
* Add `tests/` under the project to:
* Start the app (if necessary)
* Send a request/trigger that reaches the vulnerable sink
* Assert that a sentinel side effect occurs (e.g. log or marker file) instead of real exploitation.
Example for Node using Jest:
```js
test("should reach deserialization sink", async () => {
const res = await request(app)
.post("/api/posts")
.send({ title: "x", body: '{"__proto__":{}}' });
expect(res.statusCode).toBe(200);
// Sink logs "REACH_SINK" we check log or variable
expect(sinkWasReached()).toBe(true);
});
```
### Task 4.2 Instrument coverage
**Developer:** Language Tracks
* For each language, pick a coverage tool:
* JS: `nyc` + `istanbul`
* Python: `coverage.py`
* Java: `jacoco`
* C: `gcov`/`llvm-cov` (optional for v1)
* Ensure running tests produces `outputs/coverage.json` or `.xml` that we then convert to a simple JSON format:
```json
{
"files": {
"src/controllers/posts.js": {
"lines_covered": [12, 13, 14, 27],
"lines_total": 40
}
}
}
```
Create a small converter script if needed.
### Task 4.3 Optional dynamic traces
If you want richer evidence:
* JS: add middleware that logs `(entry_id, handler, sink)` triples to `outputs/traces/traces.json`
* Python: similar using decorators
* C/Java: out of scope for v1 unless you want to invest extra time.
---
## 5. Phase 5 Scoring tool (CLI)
### Task 5.1 Implement `rb-score` library + CLI
**Developer:** Benchmark Core
Create `benchmark/tools/scorer/rb_score/` with:
* `loader.py`
* Load all `case.yaml`, `truth.yaml` into memory.
* Provide functions: `load_cases() -> Dict[case_id, Case]`.
* `metrics.py`
* Implement:
* `compute_precision_recall(truth, predictions)`
* `compute_path_quality_score(explain_block)` (03)
* `compute_runtime_stats(run_block)`
* `cli.py`
* CLI:
```bash
rb-score \
--cases-root benchmark/cases \
--submission submissions/mytool.json \
--output results/mytool_results.json
```
**Pseudo-code for core scoring**
```python
def score_submission(truth, submission):
y_true = []
y_pred = []
per_case_scores = {}
for case in truth:
gt = truth[case.id].label # reachable/unreachable
pred_case = find_pred_case(submission.cases, case.id)
pred_label = pred_case.prediction if pred_case else "unreachable"
y_true.append(gt == "reachable")
y_pred.append(pred_label == "reachable")
explain_score = explainability(pred_case.explain if pred_case else None)
per_case_scores[case.id] = {
"gt": gt,
"pred": pred_label,
"explainability": explain_score,
}
precision, recall, f1 = compute_prf(y_true, y_pred)
return {
"summary": {
"precision": precision,
"recall": recall,
"f1": f1,
"num_cases": len(truth),
},
"cases": per_case_scores,
}
```
### Task 5.2 Explainability scoring rules
**Developer:** Benchmark Core
Implement `explainability(explain)`:
* 0 `explain` missing or `path` empty
* 1 `path` present with at least 2 nodes (sink + one function)
* 2 `path` contains:
* Entry label (HTTP route/CLI id)
* ≥3 nodes (entry → … → sink)
* 3 Level 2 plus `guards` list non-empty
Unit tests for at least 4 scenarios.
### Task 5.3 Regression tests for scoring
Add small test fixture:
* Tiny synthetic benchmark: 3 cases, 2 reachable, 1 unreachable.
* 3 submissions:
* Perfect
* All reachable
* All unreachable
Assertions:
* Perfect: `precision=1, recall=1`
* All reachable: `recall=1, precision<1`
* All unreachable: `precision=1 (trivially on negatives), recall=0`
---
## 6. Phase 6 Baseline integrations
### Task 6.1 Semgrep baseline
**Developer:** Benchmark Core (with Semgrep experience)
* `baselines/semgrep/run_case.sh`:
* Inputs: `case_id`, `cases_root`, `output_path`
* Steps:
* Find `src/` for case
* Run `semgrep --config auto` or curated rules
* Convert Semgrep findings into benchmark submission format:
* Map Semgrep rules → vulnerability types → candidate sinks
* Heuristically guess reachability (for v1, maybe always “reachable” if sink in code path)
* Output: `output_path` JSON conforming to `submission.schema.json`.
### Task 6.2 CodeQL baseline
* Create CodeQL databases for each project (likely via `codeql database create`).
* Create queries targeting known sinks (e.g., `Deserialization`, `CommandInjection`).
* `baselines/codeql/run_case.sh`:
* Build DB (or reuse)
* Run queries
* Translate results into our submission format (again as heuristic reachability).
### Task 6.3 Optional Snyk / angr baselines
* Snyk:
* Use `snyk test` on the project
* Map results to dependencies & known CVEs
* For v1, just mark as `reachable` if Snyk reports a reachable path (if available).
* angr:
* For 12 small C samples, configure simple analysis script.
**Acceptance criteria**
* For at least 5 cases (across languages), the baselines produce valid submission JSON.
* `rb-score` runs and yields metrics without errors.
---
## 7. Phase 7 CI/CD
### Task 7.1 GitHub Actions workflow
**Developer:** Benchmark Core
`ci/github/benchmark.yml`:
Jobs:
1. `lint-and-test`
* `python -m pip install -e benchmark/tools/scorer[dev]`
* `make lint`
* `make test`
2. `build-cases`
* `python benchmark/tools/build/build_all.py`
* Run `validate_builds.py`
3. `smoke-baselines`
* For 23 cases, run Semgrep/CodeQL wrappers and ensure they emit valid submissions.
### Task 7.2 Artifact upload
* Upload `outputs/` tarball from `build-cases` as workflow artifacts.
* Upload `results/*.json` from scoring runs.
---
## 8. Phase 8 Website & leaderboard
### Task 8.1 Define results JSON format
**Developer:** Benchmark Core + Website dev
`results/leaderboard.json`:
```json
{
"tools": [
{
"name": "Semgrep",
"version": "1.60.0",
"summary": {
"precision": 0.72,
"recall": 0.48,
"f1": 0.58
},
"by_language": {
"javascript": {"precision": 0.80, "recall": 0.50, "f1": 0.62},
"python": {"precision": 0.65, "recall": 0.45, "f1": 0.53}
}
}
]
}
```
CLI option to generate this:
```bash
rb-score compare \
--cases-root benchmark/cases \
--submissions submissions/*.json \
--output results/leaderboard.json
```
### Task 8.2 Static site
**Developer:** Website dev
Tech choice: any static framework (Next.js, Astro, Docusaurus, or even pure HTML+JS).
Pages:
* **Home**
* What is reachability?
* Summary of benchmark
* **Leaderboard**
* Renders `leaderboard.json`
* Filters: language, case size
* **Docs**
* How to run benchmark locally
* How to prepare a submission
Add a simple script to copy `results/leaderboard.json` into `website/public/` for publishing.
---
## 9. Phase 9 Docs, governance, and contribution flow
### Task 9.1 CONTRIBUTING.md
Include:
* How to add a new case:
* Stepbystep:
1. Create project folder under `benchmark/cases/<lang>/<project>/case-XXX/`
2. Add `case.yaml`, `entrypoints.yaml`, `truth.yaml`
3. Add oracles (tests, coverage)
4. Add deterministic `build/` assets
5. Run local tooling:
* `validate_schema.py`
* `validate_builds.py --case <id>`
* Example PR description template.
### Task 9.2 Governance doc
* Define **Technical Advisory Committee (TAC)** roles:
* Approve new cases
* Approve schema changes
* Manage hidden test sets (future phase)
* Define **release cadence**:
* v1.0 with public cases
* Quarterly updates with new hidden cases.
---
## 10. Suggested milestone breakdown (for planning / sprints)
### Milestone 1 Foundation (12 sprints)
* Repo scaffolding (Tasks 1.x)
* Schemas (Tasks 2.x)
* Two tiny toy cases (one JS, one Python) with:
* `case.yaml`, `entrypoints.yaml`, `truth.yaml`
* Deterministic build
* Basic unit tests
* Minimal `rb-score` with:
* Case loading
* Precision/recall only
**Exit:** You can run `rb-score` on a dummy submission for 2 cases.
---
### Milestone 2 v1 dataset (23 sprints)
* Add ~2030 cases across JS, Python, Java, C
* Ground truth & coverage for each
* Deterministic builds validated
* Explainability scoring implemented
* Regression tests for `rb-score`
**Exit:** Full scoring tool stable; dataset repeatably builds on CI.
---
### Milestone 3 Baselines & site (12 sprints)
* Semgrep + CodeQL baselines producing valid submissions
* CI running smoke baselines
* `leaderboard.json` generator
* Static website with public leaderboard and docs
**Exit:** Public v1 benchmark you can share with external tool authors.
---
If you tell me which stack your team prefers for the site (React, plain HTML, SSG, etc.) or which CI youre on, I can adapt this into concrete config files (e.g., a full GitHub Actions workflow, Next.js scaffold, or exact `pyproject.toml` for `rb-score`).

View File

@@ -0,0 +1,602 @@
Heres a simple, lowfriction way to keep priorities fresh without constant manual grooming: **let confidence decay over time**.
![A small curve sloping down over time, illustrating exponential decay](https://dummyimage.com/800x250/ffffff/000000\&text=confidence\(t\)%20=%20e^{-t/τ})
# Exponential confidence decay (what & why)
* **Idea:** Every item (task, lead, bug, doc, hypothesis) has a confidence score that **automatically shrinks with time** if you dont touch it.
* **Formula:** `confidence(t) = e^(t/τ)` where `t` is days since last signal (edit, comment, commit, new data), and **τ (“tau”)** is the decay constant.
* **Rule of thumb:** With **τ = 30 days**, at **t = 30** the confidence is **e^(1) ≈ 0.37**—about a **63% drop**. This surfaces longignored items *gradually*, not with harsh “stale/expired” flips.
# How to use it in practice
* **Signals that reset t → 0:** comment on the ticket, new benchmark, fresh log sample, doc update, CI run, new market news.
* **Sort queues by:** `priority × confidence(t)` (or severity × confidence). Quiet items drift down; truly active ones stay up.
* **Escalation bands:**
* `>0.6` = green (recently touched)
* `0.30.6` = amber (review soon)
* `<0.3` = red (poke or close)
# Quick presets
* **Fastmoving queues (incidents, hot leads):** τ = **714** days
* **Engineering tasks / product docs:** τ = **30** days
* **Research bets / roadmaps:** τ = **6090** days
# For your world (StellaOps + ops/dev work)
* **Vuln tickets:** `risk_score = CVSS × reachability × e^(t/30)`
* **Roadmap epics:** `value_score = impact × e^(t/60)` to rerank quarterly.
* **Docs:** show a badge “freshness: 42%” derived from last edit age to nudge updates.
# Minimal implementation sketch
* Store peritem: `last_signal_at`, `base_priority`.
* Compute on read:
```
days = (now - last_signal_at).days
conf = exp(-days / tau)
score = base_priority * conf
```
* Recompute in your API layer or materialize nightly; no cron spam needed.
If you want, I can draft a tiny C# helper (and SQL snippet) you can drop into your issue service to add `confidence(t)` and color bands to your lists.
Perfect, lets turn the idea into something your devs can actually build.
Below is an implementation plan you can drop into a ticketing/PRD — with clear phases, data model changes, APIs, and some sample code (C# + SQL). Ill also sprinkle in StellaOpsspecific notes.
---
## 0. Scope & Objectives
**Goal:** Introduce `confidence(t)` as an automatic freshness factor that decays with time and is used to rank and highlight work.
Well apply it to:
* Vulnerabilities (StellaOps)
* General issues / tasks / epics
* (Optional) Docs, leads, hypotheses later
**Core behavior:**
* Each item has:
* A base priority / risk (from severity, business impact, etc.)
* A timestamp of last signal (meaningful activity)
* A decay rate τ (tau) in days
* Effective priority = `base_priority × confidence(t)`
* `confidence(t) = exp( t / τ)` where `t` = days since last_signal
---
## 1. Data Model Changes
### 1.1. Add fields to core “work item” tables
For each relevant table (`Issues`, `Vulnerabilities`, `Epics`, …):
**New columns:**
* `base_priority` (FLOAT or INT)
* Example: 1100, or derived from severity.
* `last_signal_at` (DATETIME, NOT NULL, default = `created_at`)
* `tau_days` (FLOAT, nullable, falls back to type default)
* (Optional) `confidence_score_cached` (FLOAT, for materialized score)
* (Optional) `is_confidence_frozen` (BOOL, default FALSE)
For pinned items that should not decay.
**Example Postgres migration (Issues):**
```sql
ALTER TABLE issues
ADD COLUMN base_priority DOUBLE PRECISION,
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ADD COLUMN tau_days DOUBLE PRECISION,
ADD COLUMN confidence_cached DOUBLE PRECISION,
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
```
For StellaOps:
```sql
ALTER TABLE vulnerabilities
ADD COLUMN base_risk DOUBLE PRECISION,
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ADD COLUMN tau_days DOUBLE PRECISION,
ADD COLUMN confidence_cached DOUBLE PRECISION,
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
```
### 1.2. Add a config table for τ per entity type
```sql
CREATE TABLE confidence_decay_config (
id SERIAL PRIMARY KEY,
entity_type TEXT NOT NULL, -- 'issue', 'vulnerability', 'epic', 'doc'
tau_days_default DOUBLE PRECISION NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
INSERT INTO confidence_decay_config (entity_type, tau_days_default) VALUES
('incident', 7),
('vulnerability', 30),
('issue', 30),
('epic', 60),
('doc', 90);
```
---
## 2. Define “signal” events & instrumentation
We need a standardized way to say: “this item got activity → reset last_signal_at”.
### 2.1. Signals that should reset `last_signal_at`
For **issues / epics:**
* New comment
* Status change (e.g., Open → In Progress)
* Field change that matters (severity, owner, milestone)
* Attachment added
* Link to PR added or updated
* New CI failure linked
For **vulnerabilities (StellaOps):**
* New scanner result attached or status updated (e.g., “Verified”, “False Positive”)
* New evidence (PoC, exploit notes)
* SLA override change
* Assignment / ownership change
* Integration events (e.g., PR merge that references the vuln)
For **docs (if you do it):**
* Any edit
* Comment/annotation
### 2.2. Implement a shared helper to record a signal
**Service-level helper (pseudocode / C#-ish):**
```csharp
public interface IConfidenceSignalService
{
Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null);
}
public class ConfidenceSignalService : IConfidenceSignalService
{
private readonly IWorkItemRepository _repo;
private readonly IConfidenceConfigService _config;
public async Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null)
{
var now = signalTimeUtc ?? DateTime.UtcNow;
var item = await _repo.GetByIdAsync(type, itemId);
if (item == null) return;
item.LastSignalAt = now;
if (item.TauDays == null)
{
item.TauDays = await _config.GetDefaultTauAsync(type);
}
await _repo.UpdateAsync(item);
}
}
```
### 2.3. Wire signals into existing flows
Create small tasks for devs like:
* **ISS-01:** Call `RecordSignalAsync` on:
* New issue comment handler
* Issue status update handler
* Issue field update handler (severity/priority/owner)
* **VULN-01:** Call `RecordSignalAsync` when:
* New scanner result ingested for a vuln
* Vulnerability status, SLA, or owner changes
* New exploit evidence is attached
---
## 3. Confidence & scoring calculation
### 3.1. Shared confidence function
Definition:
```csharp
public static class ConfidenceMath
{
// t = days since last signal
public static double ConfidenceScore(DateTime lastSignalAtUtc, double tauDays, DateTime? nowUtc = null)
{
var now = nowUtc ?? DateTime.UtcNow;
var tDays = (now - lastSignalAtUtc).TotalDays;
if (tDays <= 0) return 1.0;
if (tauDays <= 0) return 1.0; // guard / fallback
var score = Math.Exp(-tDays / tauDays);
// Optional: never drop below a tiny floor, so items never "disappear"
const double floor = 0.01;
return Math.Max(score, floor);
}
}
```
### 3.2. Effective priority formulas
**Generic issues / tasks:**
```csharp
double effectiveScore = issue.BasePriority * ConfidenceMath.ConfidenceScore(issue.LastSignalAt, issue.TauDays ?? defaultTau);
```
**Vulnerabilities (StellaOps):**
Lets define:
* `severity_weight`: map CVSS or severity string to numeric (e.g. Critical=100, High=80, Medium=50, Low=20).
* `reachability`: 01 (e.g. from your reachability analysis).
* `exploitability`: 01 (optional, based on known exploits).
* `confidence`: as above.
```csharp
double baseRisk = severityWeight * reachability * exploitability; // or simpler: severityWeight * reachability
double conf = ConfidenceMath.ConfidenceScore(vuln.LastSignalAt, vuln.TauDays ?? defaultTau);
double effectiveRisk = baseRisk * conf;
```
Store `baseRisk` → `vulnerabilities.base_risk`, and compute `effectiveRisk` on the fly or via job.
### 3.3. SQL implementation (optional for server-side sorting)
**Postgres example:**
```sql
-- t_days = age in days
-- tau = tau_days
-- score = exp(-t_days / tau)
SELECT
i.*,
i.base_priority *
GREATEST(
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
0.01
) AS effective_priority
FROM issues i
ORDER BY effective_priority DESC;
```
You can wrap that in a view:
```sql
CREATE VIEW issues_with_confidence AS
SELECT
i.*,
GREATEST(
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
0.01
) AS confidence,
i.base_priority *
GREATEST(
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
0.01
) AS effective_priority
FROM issues i;
```
---
## 4. Caching & performance
You have two options:
### 4.1. Compute on read (simplest to start)
* Use the helper function in your service layer or a DB view.
* Pros:
* No jobs, always fresh.
* Cons:
* Slight CPU cost on heavy lists.
**Plan:** Start with this. If you see perf issues, move to 4.2.
### 4.2. Periodic materialization job (optional later)
Add a scheduled job (e.g. hourly) that:
1. Selects all active items.
2. Computes `confidence_score` and `effective_priority`.
3. Writes to `confidence_cached` and `effective_priority_cached` (if you add such a column).
Service then sorts by cached values.
---
## 5. Backfill & migration
### 5.1. Initial backfill script
For existing records:
* If `last_signal_at` is NULL → set to `created_at`.
* Derive `base_priority` / `base_risk` from existing severity fields.
* Set `tau_days` from config.
**Example:**
```sql
UPDATE issues
SET last_signal_at = created_at
WHERE last_signal_at IS NULL;
UPDATE issues
SET base_priority = CASE severity
WHEN 'critical' THEN 100
WHEN 'high' THEN 80
WHEN 'medium' THEN 50
WHEN 'low' THEN 20
ELSE 10
END
WHERE base_priority IS NULL;
UPDATE issues i
SET tau_days = c.tau_days_default
FROM confidence_decay_config c
WHERE c.entity_type = 'issue'
AND i.tau_days IS NULL;
```
Do similarly for `vulnerabilities` using severity / CVSS.
### 5.2. Sanity checks
Add a small script/test to verify:
* Newly created items → `confidence ≈ 1.0`.
* 30-day-old items with τ=30 → `confidence ≈ 0.37`.
* Ordering changes when you edit/comment on items.
---
## 6. API & Query Layer
### 6.1. New sorting options
Update list APIs:
* Accept parameter: `sort=effective_priority` or `sort=confidence`.
* Default sort for some views:
* Vulnerabilities backlog: `sort=effective_risk` (risk × confidence).
* Issues backlog: `sort=effective_priority`.
**Example REST API contract:**
`GET /api/issues?sort=effective_priority&state=open`
**Response fields (additions):**
```json
{
"id": "ISS-123",
"title": "Fix login bug",
"base_priority": 80,
"last_signal_at": "2025-11-01T10:00:00Z",
"tau_days": 30,
"confidence": 0.63,
"effective_priority": 50.4,
"confidence_band": "amber"
}
```
### 6.2. Confidence banding (for UI)
Define bands server-side (easy to change):
* Green: `confidence >= 0.6`
* Amber: `0.3 ≤ confidence < 0.6`
* Red: `confidence < 0.3`
You can compute on server:
```csharp
string ConfidenceBand(double confidence) =>
confidence >= 0.6 ? "green"
: confidence >= 0.3 ? "amber"
: "red";
```
---
## 7. UI / UX changes
### 7.1. List views (issues / vulns / epics)
For each item row:
* Show a small freshness pill:
* Text: `Active`, `Review soon`, `Stale`
* Derived from confidence band.
* Tooltip:
* “Confidence 78%. Last activity 3 days ago. τ = 30 days.”
* Sort default: by `effective_priority` / `effective_risk`.
* Filters:
* `Freshness: [All | Active | Review soon | Stale]`
* Optionally: “Show stale only” toggle.
**Example labels:**
* Green: “Active (confidence 82%)”
* Amber: “Review soon (confidence 45%)”
* Red: “Stale (confidence 18%)”
### 7.2. Detail views
On an issue / vuln page:
* Add a “Confidence” section:
* “Confidence: **52%**”
* “Last signal: **12 days ago**”
* “Decay τ: **30 days**”
* “Effective priority: **Base 80 × 0.52 = 42**”
* (Optional) small mini-chart (text-only or simple bar) showing approximate decay, but not necessary for first iteration.
### 7.3. Admin / settings UI
Add an internal settings page:
* Table of entity types with editable τ:
| Entity type | τ (days) | Notes |
| ------------- | -------- | ---------------------------- |
| Incident | 7 | Fast-moving |
| Vulnerability | 30 | Standard risk review cadence |
| Issue | 30 | Sprint-level decay |
| Epic | 60 | Quarterly |
| Doc | 90 | Slow decay |
* Optionally: toggle to pin item (`is_confidence_frozen`) from UI.
---
## 8. StellaOpsspecific behavior
For vulnerabilities:
### 8.1. Base risk calculation
Ingested fields you likely already have:
* `cvss_score` or `severity`
* `reachable` (true/false or numeric)
* (Optional) `exploit_available` (bool) or exploitability score
* `asset_criticality` (15)
Define `base_risk` as:
```text
severity_weight = f(cvss_score or severity)
reachability = reachable ? 1.0 : 0.5 -- example
exploitability = exploit_available ? 1.0 : 0.7
asset_factor = 0.5 + 0.1 * asset_criticality -- 1 → 1.0, 5 → 1.5
base_risk = severity_weight * reachability * exploitability * asset_factor
```
Store `base_risk` on vuln row.
Then:
```text
effective_risk = base_risk * confidence(t)
```
Use `effective_risk` for backlog ordering and SLAs dashboards.
### 8.2. Signals for vulns
Make sure these all call `RecordSignalAsync(Vulnerability, vulnId)`:
* New scan result for same vuln (re-detected).
* Change status to “In Progress”, “Ready for Deploy”, “Verified Fixed”, etc.
* Assigning an owner.
* Attaching PoC / exploit details.
### 8.3. Vuln UI copy ideas
* Pill text:
* “Risk: 850 (confidence 68%)”
* “Last analyst activity 11 days ago”
* In backlog view: show **Effective Risk** as main sort, with a smaller subtext “Base 1200 × Confidence 71%”.
---
## 9. Rollout plan
### Phase 1 Infrastructure (backend-only)
* [ ] DB migrations & config table
* [ ] Implement `ConfidenceMath` and helper functions
* [ ] Implement `IConfidenceSignalService`
* [ ] Wire signals into key flows (comments, state changes, scanner ingestion)
* [ ] Add `confidence` and `effective_priority/risk` to API responses
* [ ] Backfill script + dry run in staging
### Phase 2 Internal UI & feature flag
* [ ] Add optional sorting by effective score to internal/staff views
* [ ] Add confidence pill (hidden behind feature flag `confidence_decay_v1`)
* [ ] Dogfood internally:
* Do items bubble up/down as expected?
* Are any items “disappearing” because decay is too aggressive?
### Phase 3 Parameter tuning
* [ ] Adjust τ per type based on feedback:
* If things decay too fast → increase τ
* If queues rarely change → decrease τ
* [ ] Decide on confidence floor (0.01? 0.05?) so nothing goes to literal 0.
### Phase 4 General release
* [ ] Make effective score the default sort for key views:
* Vulnerabilities backlog
* Issues backlog
* [ ] Document behavior for users (help center / inline tooltip)
* [ ] Add admin UI to tweak τ per entity type.
---
## 10. Edge cases & safeguards
* **New items**
* `last_signal_at = created_at`, confidence = 1.0.
* **Pinned items**
* If `is_confidence_frozen = true` → treat confidence as 1.0.
* **Items without τ**
* Always fallback to entity type default.
* **Timezones**
* Always store & compute in UTC.
* **Very old items**
* Floor the confidence so theyre still visible when explicitly searched.
---
If you want, I can turn this into:
* A short **technical design doc** (with sections: Problem, Proposal, Alternatives, Rollout).
* Or a **set of Jira tickets** grouped by backend / frontend / infra that your team can pick up directly.

View File

@@ -0,0 +1,636 @@
Heres a compact, onescreen “CVSSv4.0 Score Receipt” you can drop into StellaOps so every vulnerability carries its score, evidence, and policy lineage endtoend.
---
# CVSSv4.0 Score Receipt (CVSSBTE + Supplemental)
**Vuln ID / Title**
**Final CVSS v4.0 Score:** *X.Y* (CVSSBTE) • **Vector:** `CVSS:4.0/...`
**Why BTE?** CVSSv4.0 is designed to combine Base with default Threat/Environmental first, then amend with real context; Supplemental adds nonscoring context. ([FIRST][1])
---
## 1) Base Metrics (intrinsic; vendor/researcher)
*List each metric with chosen value + short justification + evidence link.*
* **Attack Vector (AV):** N | A | I | P — *reason & evidence*
* **Attack Complexity (AC):** L | H — *reason & evidence*
* **Attack Requirements (AT):** N | P | ? — *reason & evidence*
* **Privileges Required (PR):** N | L | H — *reason & evidence*
* **User Interaction (UI):** Passive | Active — *reason & evidence*
* **Vulnerable System Impact (VC/VI/VA):** H | L | N — *reason & evidence*
* **Subsequent System Impact (SC/SI/SA):** H | L | N — *reason & evidence*
> Notes: v4.0 clarifies Base, splits vulnerable vs. subsequent system impact, and refines UI (Passive/Active). ([FIRST][1])
---
## 2) Threat Metrics (timevarying; consumer)
* **Exploit Maturity (E):** Attacked | POC | Unreported | NotDefined — *intel & source*
* **Automatable (AU):** Yes | No | ND — *tooling/observations*
* **Provider Urgency (U):** High | Medium | Low | ND — *advisory/ref*
> Threat replaces the old Temporal concept and adjusts severity with realworld exploitation context. ([FIRST][1])
---
## 3) Environmental Metrics (your environment)
* **Security Controls (CR/XR/AR):** Present | Partial | None — *control IDs*
* **Criticality (S, H, L, N) of asset/service:** *business tag*
* **Safety/Human Impact in your environment:** *if applicable*
> Environmental tailors the score to your environment (controls, importance). ([FIRST][1])
---
## 4) Supplemental (nonscoring context)
* **Safety, Recovery, ValueDensity, Vulnerability Response Effort, etc.:** *values + short notes*
> Supplemental adds context but does not change the numeric score. ([FIRST][1])
---
## 5) Evidence Ledger
* **Artifacts:** logs, PoCs, packet captures, SBOM slices, callgraphs, config excerpts
* **References:** vendor advisory, NVD/First calculator snapshot, exploit writeups
* **Timestamps & hash of each evidence item** (SHA256)
> Keep a permalink to the FIRST v4.0 calculator or NVD v4 calculator capture for audit. ([FIRST][2])
---
## 6) Policy & Determinism
* **Scoring Policy ID:** `cvss-policy-v4.0-stellaops-YYYYMMDD`
* **Policy Hash:** `sha256:…` (of the JSON policy used to map inputs→metrics)
* **Scoring Engine Version:** `stellaops.scorer vX.Y.Z`
* **Repro Inputs Hash:** DSSE envelope including evidence URIs + CVSS vector
> Treat the receipt as a deterministic artifact: Base with default T/E, then amended with Threat+Environmental to produce CVSSBTE; store policy/evidence hashes for replayable audits. ([FIRST][1])
---
## 7) History (amendments over time)
| Date | Changed | From → To | Reason | Link |
| ---------- | -------- | -------------- | ------------------------ | ----------- |
| 20251125 | Threat:E | POC → Attacked | Active exploitation seen | *intel ref* |
---
## Minimal JSON schema (for your UI/API)
```json
{
"vulnId": "CVE-YYYY-XXXX",
"title": "Short vuln title",
"cvss": {
"version": "4.0",
"vector": "CVSS:4.0/…",
"base": { "AV": "N", "AC": "L", "AT": "N", "PR": "N", "UI": "P", "VC": "H", "VI": "H", "VA": "H", "SC": "L", "SI": "N", "SA": "N", "justifications": { /* per-metric text + evidence URIs */ } },
"threat": { "E": "Attacked", "AU": "Yes", "U": "High", "evidence": [/* intel links */] },
"environmental": { "controls": { "CR": "Present", "XR": "Partial", "AR": "None" }, "criticality": "H", "notes": "…" },
"supplemental": { "safety": "High", "recovery": "Hard", "notes": "…" },
"finalScore": 9.1,
"enumeration": "CVSS-BTE"
},
"evidence": [{ "name": "exploit_poc.md", "sha256": "…", "uri": "…" }],
"policy": { "id": "cvss-policy-v4.0-stellaops-20251125", "sha256": "…", "engine": "stellaops.scorer 1.2.0" },
"repro": { "dsseEnvelope": "base64…", "inputsHash": "sha256:…" },
"history": [{ "date": "2025-11-25", "change": "Threat:E POC→Attacked", "reason": "SOC report", "ref": "…" }]
}
```
---
## Dropin UI wireframe (single screen)
* **Header bar:** Score badge (X.Y), “CVSSBTE”, vector copy button.
* **Tabs (or stacked cards):** Base • Threat • Environmental • Supplemental • Evidence • Policy • History.
* **Right rail:** “Recalculate with my env” (edits only Threat/Environmental), “Export receipt (JSON/PDF)”, “Open in FIRST/NVD calculator”.
---
If you want, Ill adapt this to your StellaOps components (DTOs, EF Core models, and a Razor/Blazor card) and wire it to your “deterministic replay” pipeline so every scan emits this receipt alongside the VEX note.
[1]: https://www.first.org/cvss/v4-0/specification-document?utm_source=chatgpt.com "CVSS v4.0 Specification Document"
[2]: https://www.first.org/cvss/calculator/4-0?utm_source=chatgpt.com "Common Vulnerability Scoring System Version 4.0 Calculator"
Perfect, lets turn that receipt idea into a concrete implementation plan your devs can actually build from.
Ill break it into phases and responsibilities (backend, frontend, platform/DevOps), with enough detail that someone could start creating tickets from this.
---
## 0. Align on Scope & Definitions
**Goal:** For every vulnerability in Stella Ops, store and display a **CVSS v4.0 CVSSBTE score receipt** that is:
* Deterministic & reproducible (policy + inputs → same score).
* Evidenced (links + hashes of artifacts).
* Auditable over time (history of amendments).
* Friendly to both **vendor/base** and **consumer/threat/env** workflows.
**Key concepts to lock in with the team (no coding yet):**
* **Primary object**: `CvssScoreReceipt` attached to a `Vulnerability`.
* **Canonical score** = **CVSSBTE** (Base + Threat + Environmental).
* **Base** usually from vendor/researcher; Threat + Environmental from Stella Ops / customer context.
* **Supplemental** metrics: stored but **not part of numeric score**.
* **Policy**: machine-readable config (e.g., JSON) that defines how you map questionnaire/inputs → CVSS metrics.
Deliverable: 23 page internal spec summarizing above for devs + PMs.
---
## 1. Data Model Design
### 1.1 Core Entities
*Model names are illustrative; adapt to your stack.*
**Vulnerability**
* `id`
* `externalId` (e.g. CVE)
* `title`
* `description`
* `currentCvssReceiptId` (FK → `CvssScoreReceipt`)
**CvssScoreReceipt**
* `id`
* `vulnerabilityId` (FK)
* `version` (e.g. `"4.0"`)
* `enumeration` (e.g. `"CVSS-BTE"`)
* `vectorString` (full v4.0 vector)
* `finalScore` (numeric, 0.010.0)
* `baseScore` (derived or duplicate for convenience)
* `threatScore` (optional interim)
* `environmentalScore` (optional interim)
* `createdAt`
* `createdByUserId`
* `policyId` (FK → `CvssPolicy`)
* `policyHash` (sha256 of policy JSON)
* `inputsHash` (sha256 of normalized scoring inputs)
* `dsseEnvelope` (optional text/blob if you implement full DSSE)
* `metadata` (JSON for any extras you want)
**BaseMetrics (v4.0)**
* `id`, `receiptId` (FK)
* `AV`, `AC`, `AT`, `PR`, `UI`
* `VC`, `VI`, `VA`, `SC`, `SI`, `SA`
* `justifications` (JSON object keyed by metric)
* e.g. `{ "AV": { "reason": "...", "evidenceIds": ["..."] }, ... }`
**ThreatMetrics**
* `id`, `receiptId` (FK)
* `E` (Exploit Maturity)
* `AU` (Automatable)
* `U` (Provider/Consumer Urgency)
* `evidence` (JSON: list of intel references)
**EnvironmentalMetrics**
* `id`, `receiptId` (FK)
* `CR`, `XR`, `AR` (controls)
* `criticality` (S/H/L/N or your internal enum)
* `notes` (text/JSON)
**SupplementalMetrics**
* `id`, `receiptId` (FK)
* Fields you care about, e.g.:
* `safetyImpact`
* `recoveryEffort`
* `valueDensity`
* `vulnerabilityResponseEffort`
* `notes`
**EvidenceItem**
* `id`
* `receiptId` (FK)
* `name` (e.g. `"exploit_poc.md"`)
* `uri` (link into your blob store, S3, etc.)
* `sha256`
* `type` (log, pcap, exploit, advisory, config, etc.)
* `createdAt`
* `createdBy`
**CvssPolicy**
* `id` (e.g. `cvss-policy-v4.0-stellaops-20251125`)
* `name`
* `version`
* `engineVersion` (e.g. `stellaops.scorer 1.2.0`)
* `policyJson` (JSON)
* `sha256` (policy hash)
* `active` (bool)
* `validFrom`, `validTo` (optional)
**ReceiptHistoryEntry**
* `id`
* `receiptId` (FK)
* `date`
* `changedField` (e.g. `"Threat.E"`)
* `oldValue`
* `newValue`
* `reason`
* `referenceUri` (link to ticket / intel)
* `changedByUserId`
---
## 2. Backend Implementation Plan
### 2.1 Scoring Engine
**Tasks:**
1. **Create a `CvssV4Engine` module/package** with:
* `parseVector(string): CvssVector`
* `computeBaseScore(metrics: BaseMetrics): number`
* `computeThreatAdjustedScore(base: number, threat: ThreatMetrics): number`
* `computeEnvironmentalAdjustedScore(threatAdjusted: number, env: EnvironmentalMetrics): number`
* `buildVector(metrics: BaseMetrics & ThreatMetrics & EnvironmentalMetrics): string`
2. Implement **CVSS v4.0 math** exactly per spec (rounding rules, minimums, etc.).
3. Add **unit tests** for all official sample vectors + your own edge cases.
**Deliverables:**
* Test suite `CvssV4EngineTests` with:
* Known test vectors (from spec or FIRST calculator)
* Edge cases: missing threat/env, zero-impact vulnerabilities, etc.
---
### 2.2 Receipt Construction Pipeline
Define a canonical function in backend:
```pseudo
function createReceipt(vulnId, input, policyId, userId):
policy = loadPolicy(policyId)
normalizedInput = applyPolicy(input, policy) // map UI questionnaire → CVSS metrics
base = normalizedInput.baseMetrics
threat = normalizedInput.threatMetrics
env = normalizedInput.environmentalMetrics
supplemental = normalizedInput.supplemental
// Score
baseScore = CvssV4Engine.computeBaseScore(base)
threatScore = CvssV4Engine.computeThreatAdjustedScore(baseScore, threat)
finalScore = CvssV4Engine.computeEnvironmentalAdjustedScore(threatScore, env)
// Vector
vector = CvssV4Engine.buildVector({base, threat, env})
// Hashes
inputsHash = sha256(serializeForHashing({ base, threat, env, supplemental, evidenceRefs: input.evidenceIds }))
policyHash = policy.sha256
dsseEnvelope = buildDSSEEnvelope({ vulnId, base, threat, env, supplemental, policyId, policyHash, inputsHash })
// Persist entities in transaction
receipt = saveCvssScoreReceipt(...)
saveBaseMetrics(receipt.id, base)
saveThreatMetrics(receipt.id, threat)
saveEnvironmentalMetrics(receipt.id, env)
saveSupplementalMetrics(receipt.id, supplemental)
linkEvidence(receipt.id, input.evidenceItems)
updateVulnerabilityCurrentReceipt(vulnId, receipt.id)
return receipt
```
**Important implementation details:**
* **`serializeForHashing`**: define a stable ordering and normalization (sorted keys, no whitespace sensitivity, canonical enums) so hashes are truly deterministic.
* Use **transactions** so partial writes never leave `Vulnerability` pointing to incomplete receipts.
* Ensure **idempotency**: if same `inputsHash + policyHash` already exists for that vuln, you can either:
* return existing receipt, or
* create a new one but mark it as a duplicate-of; choose one rule and document it.
---
### 2.3 APIs
Design REST/GraphQL endpoints (adapt names to your style):
**Read:**
* `GET /vulnerabilities/{id}/cvss-receipt`
* Returns full receipt with nested metrics, evidence, policy metadata, history.
* `GET /vulnerabilities/{id}/cvss-receipts`
* List historical receipts/versions.
**Create / Update:**
* `POST /vulnerabilities/{id}/cvss-receipt`
* Body: CVSS input payload (not raw metrics) + policyId.
* Backend applies policy → metrics, computes scores, stores receipt.
* `POST /vulnerabilities/{id}/cvss-receipt/recalculate`
* Optional: allows updating **only Threat + Environmental** while preserving Base.
**Evidence:**
* `POST /cvss-receipts/{receiptId}/evidence`
* Upload/link evidence artifacts, compute sha256, associate with receipt.
* (Or integrate with your existing evidence/attachments service and only store references.)
**Policy:**
* `GET /cvss-policies`
* `GET /cvss-policies/{id}`
**History:**
* `GET /cvss-receipts/{receiptId}/history`
Add auth/authorization:
* Only certain roles can **change Base**.
* Different roles can **change Threat/Env**.
* Audit logs for each change.
---
### 2.4 Integration with Existing Pipelines
**Automatic creation paths:**
1. **Scanner import path**
* When new vulnerability is imported with vendor CVSS v4:
* Parse vendor vector → BaseMetrics.
* Use your default policy to set Threat/Env to “NotDefined”.
* Generate initial receipt (tag as `source = "vendor"`).
2. **Manual analyst scoring**
* Analyst opens Vuln in Stella Ops UI.
* Fills out guided form.
* Frontend calls `POST /vulnerabilities/{id}/cvss-receipt`.
3. **Customer-specific Environmental scoring**
* Per-tenant policy stored in `CvssPolicy`.
* Receipts store that policyId; calculating environment-specific scores uses those controls/criticality.
---
## 3. Frontend / UI Implementation Plan
### 3.1 Main “CVSS Score Receipt” Panel
Single screen/card with sections (tabs or accordions):
1. **Header**
* Large score badge: `finalScore` (e.g. 9.1).
* Label: `CVSS v4.0 (CVSSBTE)`.
* Color-coded severity (Low/Med/High/Critical).
* Copy-to-clipboard for vector string.
* Show Base/Threat/Env sub-scores if you choose to expose.
2. **Base Metrics Section**
* Table or form-like display:
* Each metric: value, short textual description, collapsed justification with “View more”.
* Example row:
* **Attack Vector (AV)**: Network
* “The vulnerability is exploitable over the internet. PoC requires only TCP connectivity to port 443.”
* Evidence chips: `exploit_poc.md`, `nginx_error.log.gz`.
3. **Threat Metrics Section**
* Radio/select controls for Exploit Maturity, Automatable, Urgency.
* “Intel references” list (URLs or evidence items).
* If the user edits these and clicks **Save**, frontend:
* Builds Threat input payload.
* Calls `POST /vulnerabilities/{id}/cvss-receipt/recalculate` with updated threat/env only.
* Shows new score & appends a `ReceiptHistoryEntry`.
4. **Environmental Section**
* Controls selection: Present / Partial / None.
* Business criticality picker.
* Contextual notes.
* Same recalc flow as Threat.
5. **Supplemental Section**
* Non-scoring fields with clear label: “Does not affect numeric score, for context only”.
6. **Evidence Section**
* List of evidence items with:
* Name, type, hash, link.
* “Attach evidence” button → upload / select existing artifact.
7. **Policy & Determinism Section**
* Display:
* Policy ID + hash.
* Scoring engine version.
* Inputs hash.
* DSSE status (valid / not verified).
* Button: **“Download receipt (JSON)”** uses the JSON schema you already drafted.
* Optional: **“Open in external calculator”** with vector appended as query parameter.
8. **History Section**
* Timeline of changes:
* Date, who, what changed (e.g. `Threat.E: POC → Attacked`).
* Reason + link.
### 3.2 UX Considerations
* **Guardrails:**
* Editing Base metrics: show “This should match vendor or research data. Changing Base will alter historical comparability.”
* Display last updated time & user for each metrics block.
* **Permissions:**
* Disable inputs if user does not have edit rights; still show receipts read-only.
* **Error Handling:**
* Show vector parse or scoring errors clearly, with a reference to policy/engine version.
* **Accessibility:**
* High contrast for severity badges and clear iconography.
---
## 4. JSON Schema & Contracts
You already have a draft JSON; turn it into a formal schema (OpenAPI / JSON Schema) so backend + frontend are in sync.
Example top-level shape (high-level, not full code):
```json
{
"vulnId": "CVE-YYYY-XXXX",
"title": "Short vuln title",
"cvss": {
"version": "4.0",
"enumeration": "CVSS-BTE",
"vector": "CVSS:4.0/...",
"finalScore": 9.1,
"baseScore": 8.7,
"threatScore": 9.0,
"environmentalScore": 9.1,
"base": {
"AV": "N", "AC": "L", "AT": "N", "PR": "N", "UI": "P",
"VC": "H", "VI": "H", "VA": "H",
"SC": "L", "SI": "N", "SA": "N",
"justifications": {
"AV": { "reason": "reachable over internet", "evidence": ["ev1"] }
}
},
"threat": { "E": "Attacked", "AU": "Yes", "U": "High" },
"environmental": { "controls": { "CR": "Present", "XR": "Partial", "AR": "None" }, "criticality": "H" },
"supplemental": { "safety": "High", "recovery": "Hard" }
},
"evidence": [
{ "id": "ev1", "name": "exploit_poc.md", "uri": "...", "sha256": "..." }
],
"policy": {
"id": "cvss-policy-v4.0-stellaops-20251125",
"sha256": "...",
"engine": "stellaops.scorer 1.2.0"
},
"repro": {
"dsseEnvelope": "base64...",
"inputsHash": "sha256:..."
},
"history": [
{ "date": "2025-11-25", "change": "Threat.E POC→Attacked", "reason": "SOC report", "ref": "..." }
]
}
```
Back-end team: publish this via OpenAPI and keep it versioned.
---
## 5. Security, Integrity & Compliance
**Tasks:**
1. **Evidence Integrity**
* Enforce sha256 on every evidence item.
* Optionally re-hash blob in background and store `verifiedAt` timestamp.
2. **Immutability**
* Decide which parts of a receipt are immutable:
* Typically: Base metrics, evidence links, policy references.
* Threat/Env may change by creating **new receipts** or new “versions” of the same receipt.
* Consider:
* “Current receipt” pointer on Vulnerability.
* All receipts are read-only after creation; changes create new receipt + history entry.
3. **Audit Logging**
* Log who changed what (especially Threat/Env).
* Store reference to ticket / change request.
4. **Access Control**
* RBAC: e.g. `ROLE_SEC_ENGINEER` can set Base; `ROLE_CUSTOMER_ANALYST` can set Env; `ROLE_VIEWER` read-only.
---
## 6. Testing Strategy
**Unit Tests**
* `CvssV4EngineTests` coverage of:
* Vector parsing/serialization.
* Calculations for B, BT, BTE.
* `ReceiptBuilderTests` determinism:
* Same inputs + policy → same score + same hashes.
* Different policyId → different policyHash, different DSSE, even if metrics identical.
**Integration Tests**
* End-to-end:
* Create vulnerability → create receipt with Base only → update Threat → update Env.
* Vendor CVSS import path.
* Permission tests:
* Ensure unauthorized edits are blocked.
**UI Tests**
* Snapshot tests for the card layout.
* Behavior: changing Threat slider updates preview score.
* Accessibility checks (ARIA, focus order).
---
## 7. Rollout Plan
1. **Phase 1 Backend Foundations**
* Implement data model + migrations.
* Implement scoring engine + policies.
* Implement REST/GraphQL endpoints (feature-flagged).
2. **Phase 2 UI MVP**
* Render read-only receipts for a subset of vulnerabilities.
* Internal dogfood with security team.
3. **Phase 3 Editing & Recalc**
* Enable Threat/Env editing.
* Wire evidence upload.
* Activate history tracking.
4. **Phase 4 Vendor Integration + Tenants**
* Map scanner imports → initial Base receipts.
* Tenant-specific Environmental policies.
5. **Phase 5 Hardening**
* Performance tests (bulk listing of vulnerabilities with receipts).
* Security review of evidence and hash handling.
---
If youd like, I can turn this into:
* A set of Jira/Linear epics + tickets, or
* A stack-specific design (for example: .NET + EF Core models + Razor components, or Node + TypeScript + React components) with concrete code skeletons.

View File

@@ -0,0 +1,563 @@
Heres a crisp, readytouse rule for VEX hygiene that will save you pain in audits and customer reviews—and make StellaOps look rocksolid.
# Adopt a strict “`not_affected` only with proof” policy
**What it means (plain English):**
Only mark a vulnerability as `not_affected` if you can *prove* the vulnerable code cant run in your product under defined conditions—then record that proof (scope, entry points, limits) inside a VEX bundle.
## The nonnegotiables
* **Audit coverage:**
You must enumerate the reachable entry points you audited (e.g., exported handlers, CLI verbs, HTTP routes, scheduled jobs, init hooks). State their *limits* (versions, build flags, feature toggles, container args, config profiles).
* **VEX justification required:**
Use a concrete justification (OpenVEX/CISA style), e.g.:
* `vulnerable_code_not_in_execute_path`
* `component_not_present`
* `vulnerable_code_cannot_be_controlled_by_adversary`
* `inline_mitigation_already_in_place`
* **Impact or constraint statement:**
Explain *why* its safe given your products execution model: sandboxing, dead code elimination, policy blocks, feature gates, OS hardening, container seccomp/AppArmor, etc.
* **VEX proof bundle:**
Store the evidence alongside the VEX: callgraph slices, reachability reports, config snapshots, build args, lattice/policy decisions, test traces, and hashes of the exact artifacts (SBOM + attestation refs). This is what makes the claim stand up in an audit six months later.
## Minimal OpenVEX example (dropin)
```json
{
"document": {
"id": "urn:stellaops:vex:2025-11-25:svc-api:log4j:2.14.1",
"author": "Stella Ops Authority",
"role": "vex"
},
"statements": [
{
"vulnerability": "CVE-2021-44228",
"products": ["pkg:maven/com.acme/svc-api@1.7.3?type=jar"],
"status": "not_affected",
"justification": "vulnerable_code_not_in_execute_path",
"impact_statement": "Log4j JNDI classes excluded at build; no logger bridge; JVM flags `-Dlog4j2.formatMsgNoLookups=true` enforced by container entrypoint.",
"analysis": {
"entry_points_audited": [
"com.acme.api.HttpServer#routes",
"com.acme.jobs.Cron#run",
"Main#init"
],
"limits": {
"image_digest": "sha256:…",
"config_profile": "prod",
"args": ["--no-dynamic-plugins"],
"seccomp": "stellaops-baseline-v3"
},
"evidence_refs": [
"dsse:sha256:…/reachability.json",
"dsse:sha256:…/build-args.att",
"dsse:sha256:…/policy-lattice.proof"
]
},
"timestamp": "2025-11-25T00:00:00Z"
}
]
}
```
## Fast checklist (use this on every `not_affected`)
* [ ] Define product + artifact by immutable IDs (PURL + digest).
* [ ] List **audited entry points** and **execution limits**.
* [ ] Declare **status** = `not_affected` with a **justification** from the allowed set.
* [ ] Add a short **impact/whysafe** sentence.
* [ ] Attach **evidence**: call graph, configs, policies, build args, test traces.
* [ ] Sign the VEX (DSSE/InToto), link it to the SBOM attestation.
* [ ] Version and keep the proof bundle with your release.
## When to use an exception (temporary VEX)
If you can prove nonreachability **only under a temporary constraint** (e.g., feature flag off while a permanent fix lands), emit a **timeboxed exception** VEX:
* Add `constraints.expires` and the required control (e.g., `feature_flag=Off`, `policy=BlockJNDI`).
* Schedule an autorecheck on expiry; flip to `affected` if the constraint lapses.
---
If you want, I can generate a StellaOpsflavored VEX template and a tiny “proof bundle” schema (JSON) so your devs can drop it into the pipeline and your documentators can copypaste the rationale blocks.
Cool, lets turn that policy into something your devs can actually follow daytoday.
Below is a concrete implementation plan you can drop into an internal RFC / Notion page and wire into your pipelines.
---
## 0. What were implementing (for context)
**Goal:** At Stella Ops, you can only mark a vulnerability as `not_affected` if:
1. Youve **audited specific entry points** under clearly documented limits (version, build flags, config, container image).
2. Youve captured **evidence** and **rationale** in a VEX statement + proof bundle.
3. The VEX is **validated, signed, and shipped** with the artifact.
Well standardize on **OpenVEX** with a small extension (`analysis` section) for developerfriendly evidence.
---
## 1. Repo & artifact layout (week 1)
### 1.1. Create a standard security layout
In each service repo:
```text
/security/
vex/
openvex.json # aggregate VEX doc (generated/curated)
statements/ # one file per CVE (optional, if you like)
proofs/
CVE-YYYY-NNNN/
reachability.json
configs/
tests/
notes.md
schemas/
openvex.schema.json # JSON schema with Stella extensions
```
**Developer guidance:**
* If you touch anything related to a vulnerability decision, you **edit `security/vex/` and `security/proofs/` in the same PR**.
---
## 2. Define the VEX schema & allowed justifications (week 1)
### 2.1. Fix the format & fields
Youve already chosen OpenVEX, so formalize the required extras:
```jsonc
{
"vulnerability": "CVE-2021-44228",
"products": ["pkg:maven/com.acme/svc-api@1.7.3?type=jar"],
"status": "not_affected",
"justification": "vulnerable_code_not_in_execute_path",
"impact_statement": "…",
"analysis": {
"entry_points_audited": [
"com.acme.api.HttpServer#routes",
"com.acme.jobs.Cron#run",
"Main#init"
],
"limits": {
"image_digest": "sha256:…",
"config_profile": "prod",
"args": ["--no-dynamic-plugins"],
"seccomp": "stellaops-baseline-v3"
},
"evidence_refs": [
"dsse:sha256:…/reachability.json",
"dsse:sha256:…/build-args.att",
"dsse:sha256:…/policy-lattice.proof"
]
}
}
```
**Action items:**
* Write a **JSON schema** for the `analysis` block (required for `not_affected`):
* `entry_points_audited`: nonempty array of strings.
* `limits`: object with at least one of `image_digest`, `config_profile`, `args`, `seccomp`, `feature_flags`.
* `evidence_refs`: nonempty array of strings.
* Commit this as `security/schemas/openvex.schema.json`.
### 2.2. Fix the allowed `justification` values
Publish an internal list, e.g.:
* `vulnerable_code_not_in_execute_path`
* `component_not_present`
* `vulnerable_code_cannot_be_controlled_by_adversary`
* `inline_mitigation_already_in_place`
* `protected_by_environment` (e.g., mandatory sandbox, readonly FS)
**Rule:** any `not_affected` must pick one of these. Any new justification needs security team approval.
---
## 3. Developer process for handling a new vuln (week 2)
This is the **“how to act”** guide devs follow when a CVE pops up in scanners or customer reports.
### 3.1. Decision flow
1. **Is the vulnerable component actually present?**
* If no → `status: not_affected`, `justification: component_not_present`.
Still fill out `products`, `impact_statement` (explain why its not present: different version, module excluded, etc.).
2. **If present: analyze reachability.**
* Identify **entry points** of the service:
* HTTP routes, gRPC methods, message consumers, CLI commands, cron jobs, startup hooks.
* Check:
* Is the vulnerable path reachable from any of these?
* Is it blocked by configuration / feature flags / sandboxing?
3. **If reachable or unclear → treat as `affected`.**
* Plan a patch, workaround, or runtime mitigation.
4. **If not reachable & you can argue that clearly → `not_affected` with proof.**
* Fill in:
* `entry_points_audited`
* `limits`
* `evidence_refs`
* `impact_statement` (“why safe”)
### 3.2. Developer checklist (drop this into your docs)
> **Stella Ops `not_affected` checklist**
>
> For any CVE you mark as `not_affected`:
>
> 1. **Identify product + artifact**
>
> * [ ] PURL (package URL)
> * [ ] Image digest / binary hash
> 2. **Audit execution**
>
> * [ ] List entry points you reviewed
> * [ ] Note the limits (config profile, feature flags, container args, sandbox)
> 3. **Collect evidence**
>
> * [ ] Reachability analysis (manual or tool report)
> * [ ] Config snapshot (YAML, env vars, Helm values)
> * [ ] Tests or traces (if applicable)
> 4. **Write VEX statement**
>
> * [ ] `status = not_affected`
> * [ ] `justification` from allowed list
> * [ ] `impact_statement` explains “why safe”
> * [ ] `analysis.entry_points_audited`, `analysis.limits`, `analysis.evidence_refs`
> 5. **Wire into repo**
>
> * [ ] Proofs stored under `security/proofs/CVE-…/`
> * [ ] VEX updated under `security/vex/`
> 6. **Request review**
>
> * [ ] Security reviewer approved in PR
---
## 4. Automation & tooling for devs (week 23)
Make it easy to “do the right thing” with a small CLI and CI jobs.
### 4.1. Add a small `vexctl` helper
Language doesnt matter—Python is fine. Rough sketch:
```python
#!/usr/bin/env python3
import json
from pathlib import Path
from datetime import datetime
VEX_PATH = Path("security/vex/openvex.json")
def load_vex():
if VEX_PATH.exists():
return json.loads(VEX_PATH.read_text())
return {"document": {}, "statements": []}
def save_vex(data):
VEX_PATH.write_text(json.dumps(data, indent=2, sort_keys=True))
def add_statement():
cve = input("CVE ID (e.g. CVE-2025-1234): ").strip()
product = input("Product PURL: ").strip()
status = input("Status [affected/not_affected/fixed]: ").strip()
justification = None
analysis = None
if status == "not_affected":
justification = input("Justification (from allowed list): ").strip()
entry_points = input("Entry points (comma-separated): ").split(",")
limits_profile = input("Config profile (e.g. prod/stage): ").strip()
image_digest = input("Image digest (optional): ").strip()
evidence = input("Evidence refs (comma-separated): ").split(",")
analysis = {
"entry_points_audited": [e.strip() for e in entry_points if e.strip()],
"limits": {
"config_profile": limits_profile or None,
"image_digest": image_digest or None
},
"evidence_refs": [e.strip() for e in evidence if e.strip()]
}
impact = input("Impact / why safe (short text): ").strip()
vex = load_vex()
vex.setdefault("document", {})
vex.setdefault("statements", [])
stmt = {
"vulnerability": cve,
"products": [product],
"status": status,
"impact_statement": impact,
"timestamp": datetime.utcnow().isoformat() + "Z"
}
if justification:
stmt["justification"] = justification
if analysis:
stmt["analysis"] = analysis
vex["statements"].append(stmt)
save_vex(vex)
print(f"Added VEX statement for {cve}")
if __name__ == "__main__":
add_statement()
```
**Dev UX:** run:
```bash
./tools/vexctl add
```
and follow prompts instead of handediting JSON.
### 4.2. Schema validation in CI
Add a CI job (GitHub Actions example) that:
1. Installs `jsonschema`.
2. Validates `security/vex/openvex.json` against `security/schemas/openvex.schema.json`.
3. Fails if:
* any `not_affected` statement lacks `analysis.*` fields, or
* `justification` is not in the allowed list.
```yaml
name: VEX validation
on:
pull_request:
paths:
- "security/vex/**"
- "security/schemas/**"
jobs:
validate-vex:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install deps
run: pip install jsonschema
- name: Validate OpenVEX
run: |
python tools/validate_vex.py
```
Example `validate_vex.py` core logic:
```python
import json
from jsonschema import validate, ValidationError
from pathlib import Path
import sys
schema = json.loads(Path("security/schemas/openvex.schema.json").read_text())
vex = json.loads(Path("security/vex/openvex.json").read_text())
try:
validate(instance=vex, schema=schema)
except ValidationError as e:
print("VEX schema validation failed:", e, file=sys.stderr)
sys.exit(1)
ALLOWED_JUSTIFICATIONS = {
"vulnerable_code_not_in_execute_path",
"component_not_present",
"vulnerable_code_cannot_be_controlled_by_adversary",
"inline_mitigation_already_in_place",
"protected_by_environment",
}
for stmt in vex.get("statements", []):
if stmt.get("status") == "not_affected":
just = stmt.get("justification")
if just not in ALLOWED_JUSTIFICATIONS:
print(f"Invalid justification '{just}' in statement {stmt.get('vulnerability')}")
sys.exit(1)
analysis = stmt.get("analysis") or {}
missing = []
if not analysis.get("entry_points_audited"):
missing.append("analysis.entry_points_audited")
if not analysis.get("limits"):
missing.append("analysis.limits")
if not analysis.get("evidence_refs"):
missing.append("analysis.evidence_refs")
if missing:
print(
f"'not_affected' for {stmt.get('vulnerability')} missing fields: {', '.join(missing)}"
)
sys.exit(1)
```
---
## 5. Signing & publishing VEX + proof bundles (week 3)
### 5.1. Signing
Pick a signing mechanism (e.g., DSSE + cosign/intoto), but keep the devvisible rules simple:
* CI step:
1. Build artifact (image/binary).
2. Generate/update SBOM.
3. Validate VEX.
4. **Sign**:
* The artifact.
* The SBOM.
* The VEX document.
Enforce **KMSbacked keys** controlled by the security team.
### 5.2. Publishing layout
Decide a canonical layout in your artifact registry / S3:
```text
artifacts/
svc-api/
1.7.3/
image.tar
sbom.spdx.json
vex.openvex.json
proofs/
CVE-2025-1234/
reachability.json
configs/
tests/
```
Link evidence by digest (`evidence_refs`) so you can prove exactly what you audited.
---
## 6. PR / review policy (week 34)
### 6.1. Add a PR checklist item
In your PR template:
```md
### Security / VEX
- [ ] If this PR **changes how we handle a known CVE** or marks one as `not_affected`, I have:
- [ ] Updated `security/vex/openvex.json`
- [ ] Added/updated proof bundle under `security/proofs/`
- [ ] Ran `./tools/vexctl` and CI VEX validation locally
```
### 6.2. Require security reviewer for `not_affected` changes
Add a CODEOWNERS entry:
```text
/security/vex/* @stellaops-security-team
/security/proofs/* @stellaops-security-team
```
* Any PR touching these paths must be approved by security.
---
## 7. Handling temporary exceptions (timeboxed VEX)
Sometimes youre only safe because of a **temporary constraint** (e.g., feature flag off until patch). For those:
1. Add a `constraints` block:
```json
"constraints": {
"control": "feature_flag",
"name": "ENABLE_UNSAFE_PLUGIN_API",
"required_value": "false",
"expires": "2025-12-31T23:59:59Z"
}
```
2. Add a scheduled job (e.g., weekly) that:
* Parses VEX.
* Finds any `constraints.expires < now()`.
* Opens an issue or fails a synthetic CI job: “Constraint expired: reevaluate CVE20251234”.
Dev guidance: **do not** treat timeboxed exceptions as permanent; they must be rereviewed or turned into `affected` + mitigation.
---
## 8. Rollout plan by week
You can present this timeline internally:
* **Week 1**
* Finalize OpenVEX + `analysis` schema.
* Create `security/` layout in 12 key services.
* Publish allowed `justification` list + written policy.
* **Week 2**
* Implement `vexctl` helper.
* Add CI validation job.
* Pilot with one real CVE decision; walk through full proof bundle creation.
* **Week 3**
* Add signing + publishing steps for SBOM and VEX.
* Wire artifact registry layout, link VEX + proofs per release.
* **Week 4**
* Enforce CODEOWNERS + PR checklist across all services.
* Enable scheduled checks for expiring constraints.
* Run internal training (3045 min) walking through:
* “Bad VEX” (handwavy, no entry points) vs
* “Good VEX” (clear scope, evidence, limits).
---
## 9. What you can hand to devs right now
If you want, you can literally paste these as separate internal docs:
* **“How to mark a CVE as not_affected at Stella Ops”**
* Copy section 3 (decision flow + checklist) and the VEX snippet.
* **“VEX technical reference for developers”**
* Copy sections 124 (structure, schema, CLI, CI validation).
* **“VEX operations runbook”**
* Copy sections 57 (signing, publishing, exceptions).
---
If you tell me which CI system you use (GitHub Actions, GitLab CI, Circle, etc.) and your primary stack (Java, Go, Node, etc.), I can turn this into exact job configs and maybe a more tailored `vexctl` CLI for your environment.

View File

@@ -0,0 +1,602 @@
Heres a simple, lowfriction way to keep priorities fresh without constant manual grooming: **let confidence decay over time**.
![A small curve sloping down over time, illustrating exponential decay](https://dummyimage.com/800x250/ffffff/000000\&text=confidence\(t\)%20=%20e^{-t/τ})
# Exponential confidence decay (what & why)
* **Idea:** Every item (task, lead, bug, doc, hypothesis) has a confidence score that **automatically shrinks with time** if you dont touch it.
* **Formula:** `confidence(t) = e^(t/τ)` where `t` is days since last signal (edit, comment, commit, new data), and **τ (“tau”)** is the decay constant.
* **Rule of thumb:** With **τ = 30 days**, at **t = 30** the confidence is **e^(1) ≈ 0.37**—about a **63% drop**. This surfaces longignored items *gradually*, not with harsh “stale/expired” flips.
# How to use it in practice
* **Signals that reset t → 0:** comment on the ticket, new benchmark, fresh log sample, doc update, CI run, new market news.
* **Sort queues by:** `priority × confidence(t)` (or severity × confidence). Quiet items drift down; truly active ones stay up.
* **Escalation bands:**
* `>0.6` = green (recently touched)
* `0.30.6` = amber (review soon)
* `<0.3` = red (poke or close)
# Quick presets
* **Fastmoving queues (incidents, hot leads):** τ = **714** days
* **Engineering tasks / product docs:** τ = **30** days
* **Research bets / roadmaps:** τ = **6090** days
# For your world (StellaOps + ops/dev work)
* **Vuln tickets:** `risk_score = CVSS × reachability × e^(t/30)`
* **Roadmap epics:** `value_score = impact × e^(t/60)` to rerank quarterly.
* **Docs:** show a badge “freshness: 42%” derived from last edit age to nudge updates.
# Minimal implementation sketch
* Store peritem: `last_signal_at`, `base_priority`.
* Compute on read:
```
days = (now - last_signal_at).days
conf = exp(-days / tau)
score = base_priority * conf
```
* Recompute in your API layer or materialize nightly; no cron spam needed.
If you want, I can draft a tiny C# helper (and SQL snippet) you can drop into your issue service to add `confidence(t)` and color bands to your lists.
Perfect, lets turn the idea into something your devs can actually build.
Below is an implementation plan you can drop into a ticketing/PRD — with clear phases, data model changes, APIs, and some sample code (C# + SQL). Ill also sprinkle in StellaOpsspecific notes.
---
## 0. Scope & Objectives
**Goal:** Introduce `confidence(t)` as an automatic freshness factor that decays with time and is used to rank and highlight work.
Well apply it to:
* Vulnerabilities (StellaOps)
* General issues / tasks / epics
* (Optional) Docs, leads, hypotheses later
**Core behavior:**
* Each item has:
* A base priority / risk (from severity, business impact, etc.)
* A timestamp of last signal (meaningful activity)
* A decay rate τ (tau) in days
* Effective priority = `base_priority × confidence(t)`
* `confidence(t) = exp( t / τ)` where `t` = days since last_signal
---
## 1. Data Model Changes
### 1.1. Add fields to core “work item” tables
For each relevant table (`Issues`, `Vulnerabilities`, `Epics`, …):
**New columns:**
* `base_priority` (FLOAT or INT)
* Example: 1100, or derived from severity.
* `last_signal_at` (DATETIME, NOT NULL, default = `created_at`)
* `tau_days` (FLOAT, nullable, falls back to type default)
* (Optional) `confidence_score_cached` (FLOAT, for materialized score)
* (Optional) `is_confidence_frozen` (BOOL, default FALSE)
For pinned items that should not decay.
**Example Postgres migration (Issues):**
```sql
ALTER TABLE issues
ADD COLUMN base_priority DOUBLE PRECISION,
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ADD COLUMN tau_days DOUBLE PRECISION,
ADD COLUMN confidence_cached DOUBLE PRECISION,
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
```
For StellaOps:
```sql
ALTER TABLE vulnerabilities
ADD COLUMN base_risk DOUBLE PRECISION,
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ADD COLUMN tau_days DOUBLE PRECISION,
ADD COLUMN confidence_cached DOUBLE PRECISION,
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
```
### 1.2. Add a config table for τ per entity type
```sql
CREATE TABLE confidence_decay_config (
id SERIAL PRIMARY KEY,
entity_type TEXT NOT NULL, -- 'issue', 'vulnerability', 'epic', 'doc'
tau_days_default DOUBLE PRECISION NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
INSERT INTO confidence_decay_config (entity_type, tau_days_default) VALUES
('incident', 7),
('vulnerability', 30),
('issue', 30),
('epic', 60),
('doc', 90);
```
---
## 2. Define “signal” events & instrumentation
We need a standardized way to say: “this item got activity → reset last_signal_at”.
### 2.1. Signals that should reset `last_signal_at`
For **issues / epics:**
* New comment
* Status change (e.g., Open → In Progress)
* Field change that matters (severity, owner, milestone)
* Attachment added
* Link to PR added or updated
* New CI failure linked
For **vulnerabilities (StellaOps):**
* New scanner result attached or status updated (e.g., “Verified”, “False Positive”)
* New evidence (PoC, exploit notes)
* SLA override change
* Assignment / ownership change
* Integration events (e.g., PR merge that references the vuln)
For **docs (if you do it):**
* Any edit
* Comment/annotation
### 2.2. Implement a shared helper to record a signal
**Service-level helper (pseudocode / C#-ish):**
```csharp
public interface IConfidenceSignalService
{
Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null);
}
public class ConfidenceSignalService : IConfidenceSignalService
{
private readonly IWorkItemRepository _repo;
private readonly IConfidenceConfigService _config;
public async Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null)
{
var now = signalTimeUtc ?? DateTime.UtcNow;
var item = await _repo.GetByIdAsync(type, itemId);
if (item == null) return;
item.LastSignalAt = now;
if (item.TauDays == null)
{
item.TauDays = await _config.GetDefaultTauAsync(type);
}
await _repo.UpdateAsync(item);
}
}
```
### 2.3. Wire signals into existing flows
Create small tasks for devs like:
* **ISS-01:** Call `RecordSignalAsync` on:
* New issue comment handler
* Issue status update handler
* Issue field update handler (severity/priority/owner)
* **VULN-01:** Call `RecordSignalAsync` when:
* New scanner result ingested for a vuln
* Vulnerability status, SLA, or owner changes
* New exploit evidence is attached
---
## 3. Confidence & scoring calculation
### 3.1. Shared confidence function
Definition:
```csharp
public static class ConfidenceMath
{
// t = days since last signal
public static double ConfidenceScore(DateTime lastSignalAtUtc, double tauDays, DateTime? nowUtc = null)
{
var now = nowUtc ?? DateTime.UtcNow;
var tDays = (now - lastSignalAtUtc).TotalDays;
if (tDays <= 0) return 1.0;
if (tauDays <= 0) return 1.0; // guard / fallback
var score = Math.Exp(-tDays / tauDays);
// Optional: never drop below a tiny floor, so items never "disappear"
const double floor = 0.01;
return Math.Max(score, floor);
}
}
```
### 3.2. Effective priority formulas
**Generic issues / tasks:**
```csharp
double effectiveScore = issue.BasePriority * ConfidenceMath.ConfidenceScore(issue.LastSignalAt, issue.TauDays ?? defaultTau);
```
**Vulnerabilities (StellaOps):**
Lets define:
* `severity_weight`: map CVSS or severity string to numeric (e.g. Critical=100, High=80, Medium=50, Low=20).
* `reachability`: 01 (e.g. from your reachability analysis).
* `exploitability`: 01 (optional, based on known exploits).
* `confidence`: as above.
```csharp
double baseRisk = severityWeight * reachability * exploitability; // or simpler: severityWeight * reachability
double conf = ConfidenceMath.ConfidenceScore(vuln.LastSignalAt, vuln.TauDays ?? defaultTau);
double effectiveRisk = baseRisk * conf;
```
Store `baseRisk` → `vulnerabilities.base_risk`, and compute `effectiveRisk` on the fly or via job.
### 3.3. SQL implementation (optional for server-side sorting)
**Postgres example:**
```sql
-- t_days = age in days
-- tau = tau_days
-- score = exp(-t_days / tau)
SELECT
i.*,
i.base_priority *
GREATEST(
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
0.01
) AS effective_priority
FROM issues i
ORDER BY effective_priority DESC;
```
You can wrap that in a view:
```sql
CREATE VIEW issues_with_confidence AS
SELECT
i.*,
GREATEST(
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
0.01
) AS confidence,
i.base_priority *
GREATEST(
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
0.01
) AS effective_priority
FROM issues i;
```
---
## 4. Caching & performance
You have two options:
### 4.1. Compute on read (simplest to start)
* Use the helper function in your service layer or a DB view.
* Pros:
* No jobs, always fresh.
* Cons:
* Slight CPU cost on heavy lists.
**Plan:** Start with this. If you see perf issues, move to 4.2.
### 4.2. Periodic materialization job (optional later)
Add a scheduled job (e.g. hourly) that:
1. Selects all active items.
2. Computes `confidence_score` and `effective_priority`.
3. Writes to `confidence_cached` and `effective_priority_cached` (if you add such a column).
Service then sorts by cached values.
---
## 5. Backfill & migration
### 5.1. Initial backfill script
For existing records:
* If `last_signal_at` is NULL → set to `created_at`.
* Derive `base_priority` / `base_risk` from existing severity fields.
* Set `tau_days` from config.
**Example:**
```sql
UPDATE issues
SET last_signal_at = created_at
WHERE last_signal_at IS NULL;
UPDATE issues
SET base_priority = CASE severity
WHEN 'critical' THEN 100
WHEN 'high' THEN 80
WHEN 'medium' THEN 50
WHEN 'low' THEN 20
ELSE 10
END
WHERE base_priority IS NULL;
UPDATE issues i
SET tau_days = c.tau_days_default
FROM confidence_decay_config c
WHERE c.entity_type = 'issue'
AND i.tau_days IS NULL;
```
Do similarly for `vulnerabilities` using severity / CVSS.
### 5.2. Sanity checks
Add a small script/test to verify:
* Newly created items → `confidence ≈ 1.0`.
* 30-day-old items with τ=30 → `confidence ≈ 0.37`.
* Ordering changes when you edit/comment on items.
---
## 6. API & Query Layer
### 6.1. New sorting options
Update list APIs:
* Accept parameter: `sort=effective_priority` or `sort=confidence`.
* Default sort for some views:
* Vulnerabilities backlog: `sort=effective_risk` (risk × confidence).
* Issues backlog: `sort=effective_priority`.
**Example REST API contract:**
`GET /api/issues?sort=effective_priority&state=open`
**Response fields (additions):**
```json
{
"id": "ISS-123",
"title": "Fix login bug",
"base_priority": 80,
"last_signal_at": "2025-11-01T10:00:00Z",
"tau_days": 30,
"confidence": 0.63,
"effective_priority": 50.4,
"confidence_band": "amber"
}
```
### 6.2. Confidence banding (for UI)
Define bands server-side (easy to change):
* Green: `confidence >= 0.6`
* Amber: `0.3 ≤ confidence < 0.6`
* Red: `confidence < 0.3`
You can compute on server:
```csharp
string ConfidenceBand(double confidence) =>
confidence >= 0.6 ? "green"
: confidence >= 0.3 ? "amber"
: "red";
```
---
## 7. UI / UX changes
### 7.1. List views (issues / vulns / epics)
For each item row:
* Show a small freshness pill:
* Text: `Active`, `Review soon`, `Stale`
* Derived from confidence band.
* Tooltip:
* “Confidence 78%. Last activity 3 days ago. τ = 30 days.”
* Sort default: by `effective_priority` / `effective_risk`.
* Filters:
* `Freshness: [All | Active | Review soon | Stale]`
* Optionally: “Show stale only” toggle.
**Example labels:**
* Green: “Active (confidence 82%)”
* Amber: “Review soon (confidence 45%)”
* Red: “Stale (confidence 18%)”
### 7.2. Detail views
On an issue / vuln page:
* Add a “Confidence” section:
* “Confidence: **52%**”
* “Last signal: **12 days ago**”
* “Decay τ: **30 days**”
* “Effective priority: **Base 80 × 0.52 = 42**”
* (Optional) small mini-chart (text-only or simple bar) showing approximate decay, but not necessary for first iteration.
### 7.3. Admin / settings UI
Add an internal settings page:
* Table of entity types with editable τ:
| Entity type | τ (days) | Notes |
| ------------- | -------- | ---------------------------- |
| Incident | 7 | Fast-moving |
| Vulnerability | 30 | Standard risk review cadence |
| Issue | 30 | Sprint-level decay |
| Epic | 60 | Quarterly |
| Doc | 90 | Slow decay |
* Optionally: toggle to pin item (`is_confidence_frozen`) from UI.
---
## 8. StellaOpsspecific behavior
For vulnerabilities:
### 8.1. Base risk calculation
Ingested fields you likely already have:
* `cvss_score` or `severity`
* `reachable` (true/false or numeric)
* (Optional) `exploit_available` (bool) or exploitability score
* `asset_criticality` (15)
Define `base_risk` as:
```text
severity_weight = f(cvss_score or severity)
reachability = reachable ? 1.0 : 0.5 -- example
exploitability = exploit_available ? 1.0 : 0.7
asset_factor = 0.5 + 0.1 * asset_criticality -- 1 → 1.0, 5 → 1.5
base_risk = severity_weight * reachability * exploitability * asset_factor
```
Store `base_risk` on vuln row.
Then:
```text
effective_risk = base_risk * confidence(t)
```
Use `effective_risk` for backlog ordering and SLAs dashboards.
### 8.2. Signals for vulns
Make sure these all call `RecordSignalAsync(Vulnerability, vulnId)`:
* New scan result for same vuln (re-detected).
* Change status to “In Progress”, “Ready for Deploy”, “Verified Fixed”, etc.
* Assigning an owner.
* Attaching PoC / exploit details.
### 8.3. Vuln UI copy ideas
* Pill text:
* “Risk: 850 (confidence 68%)”
* “Last analyst activity 11 days ago”
* In backlog view: show **Effective Risk** as main sort, with a smaller subtext “Base 1200 × Confidence 71%”.
---
## 9. Rollout plan
### Phase 1 Infrastructure (backend-only)
* [ ] DB migrations & config table
* [ ] Implement `ConfidenceMath` and helper functions
* [ ] Implement `IConfidenceSignalService`
* [ ] Wire signals into key flows (comments, state changes, scanner ingestion)
* [ ] Add `confidence` and `effective_priority/risk` to API responses
* [ ] Backfill script + dry run in staging
### Phase 2 Internal UI & feature flag
* [ ] Add optional sorting by effective score to internal/staff views
* [ ] Add confidence pill (hidden behind feature flag `confidence_decay_v1`)
* [ ] Dogfood internally:
* Do items bubble up/down as expected?
* Are any items “disappearing” because decay is too aggressive?
### Phase 3 Parameter tuning
* [ ] Adjust τ per type based on feedback:
* If things decay too fast → increase τ
* If queues rarely change → decrease τ
* [ ] Decide on confidence floor (0.01? 0.05?) so nothing goes to literal 0.
### Phase 4 General release
* [ ] Make effective score the default sort for key views:
* Vulnerabilities backlog
* Issues backlog
* [ ] Document behavior for users (help center / inline tooltip)
* [ ] Add admin UI to tweak τ per entity type.
---
## 10. Edge cases & safeguards
* **New items**
* `last_signal_at = created_at`, confidence = 1.0.
* **Pinned items**
* If `is_confidence_frozen = true` → treat confidence as 1.0.
* **Items without τ**
* Always fallback to entity type default.
* **Timezones**
* Always store & compute in UTC.
* **Very old items**
* Floor the confidence so theyre still visible when explicitly searched.
---
If you want, I can turn this into:
* A short **technical design doc** (with sections: Problem, Proposal, Alternatives, Rollout).
* Or a **set of Jira tickets** grouped by backend / frontend / infra that your team can pick up directly.

View File

@@ -0,0 +1,754 @@
Heres a practical way to make a crossplatform, hashstable JSON “fingerprint” for things like a `graph_revision_id`, so your hashes dont change between OS/locale settings.
---
### What “canonical JSON” means (in plain terms)
* **Deterministic order:** Always write object properties in a fixed order (e.g., lexicographic).
* **Stable numbers:** Serialize numbers the same way everywhere (no locale, no extra zeros).
* **Normalized text:** Normalize all strings to Unicode **NFC** so accented/combined characters dont vary.
* **Consistent bytes:** Encode as **UTF8** with **LF** (`\n`) newlines only.
These ideas match the JSON Canonicalization Scheme (RFC 8785)—use it as your north star for stable hashing.
---
### Dropin C# helper (targets .NET 8/10)
This gives you a canonical UTF8 byte[] and a SHA256 hex hash. It:
* Recursively sorts object properties,
* Emits numbers with invariant formatting,
* Normalizes all string values to **NFC**,
* Uses `\n` endings,
* Produces a SHA256 for `graph_revision_id`.
```csharp
using System;
using System.Buffers.Text;
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Security.Cryptography;
using System.Text;
using System.Text.Json;
using System.Text.Json.Nodes;
using System.Text.Unicode;
public static class CanonJson
{
// Entry point: produce canonical UTF-8 bytes
public static byte[] ToCanonicalUtf8(object? value)
{
// 1) Serialize once to JsonNode to work with types safely
var initialJson = JsonSerializer.SerializeToNode(
value,
new JsonSerializerOptions
{
NumberHandling = JsonNumberHandling.AllowReadingFromString,
Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping // we will control escaping
});
// 2) Canonicalize (sort keys, normalize strings, normalize numbers)
var canonNode = CanonicalizeNode(initialJson);
// 3) Write in a deterministic manner
var sb = new StringBuilder(4096);
WriteCanonical(canonNode!, sb);
// 4) Ensure LF only
var lf = sb.ToString().Replace("\r\n", "\n").Replace("\r", "\n");
// 5) UTF-8 bytes
return Encoding.UTF8.GetBytes(lf);
}
// Convenience: compute SHA-256 hex for graph_revision_id
public static string ComputeGraphRevisionId(object? value)
{
var bytes = ToCanonicalUtf8(value);
using var sha = SHA256.Create();
var hash = sha.ComputeHash(bytes);
var sb = new StringBuilder(hash.Length * 2);
foreach (var b in hash) sb.Append(b.ToString("x2"));
return sb.ToString();
}
// --- Internals ---
private static JsonNode? CanonicalizeNode(JsonNode? node)
{
if (node is null) return null;
switch (node)
{
case JsonValue v:
if (v.TryGetValue<string>(out var s))
{
// Normalize strings to NFC
var nfc = s.Normalize(NormalizationForm.FormC);
return JsonValue.Create(nfc);
}
if (v.TryGetValue<double>(out var d))
{
// RFC-like minimal form: Invariant, no thousand sep; handle -0 => 0
if (d == 0) d = 0; // squash -0
return JsonValue.Create(d);
}
if (v.TryGetValue<long>(out var l))
{
return JsonValue.Create(l);
}
// Fallback keep as-is
return v;
case JsonArray arr:
var outArr = new JsonArray();
foreach (var elem in arr)
outArr.Add(CanonicalizeNode(elem));
return outArr;
case JsonObject obj:
// Sort keys lexicographically (RFC 8785 uses code unit order)
var sorted = new JsonObject();
foreach (var kvp in obj.OrderBy(k => k.Key, StringComparer.Ordinal))
sorted[kvp.Key] = CanonicalizeNode(kvp.Value);
return sorted;
default:
return node;
}
}
// Deterministic writer matching our canonical rules
private static void WriteCanonical(JsonNode node, StringBuilder sb)
{
switch (node)
{
case JsonObject obj:
sb.Append('{');
bool first = true;
foreach (var kvp in obj)
{
if (!first) sb.Append(',');
first = false;
WriteString(kvp.Key, sb); // property name
sb.Append(':');
WriteCanonical(kvp.Value!, sb);
}
sb.Append('}');
break;
case JsonArray arr:
sb.Append('[');
for (int i = 0; i < arr.Count; i++)
{
if (i > 0) sb.Append(',');
WriteCanonical(arr[i]!, sb);
}
sb.Append(']');
break;
case JsonValue val:
if (val.TryGetValue<string>(out var s))
{
WriteString(s, sb);
}
else if (val.TryGetValue<long>(out var l))
{
sb.Append(l.ToString(CultureInfo.InvariantCulture));
}
else if (val.TryGetValue<double>(out var d))
{
// Minimal form close to RFC 8785 guidance:
// - No NaN/Infinity in JSON
// - Invariant culture, trim trailing zeros and dot
if (double.IsNaN(d) || double.IsInfinity(d))
throw new InvalidOperationException("Non-finite numbers are not valid in canonical JSON.");
if (d == 0) d = 0; // squash -0
var sNum = d.ToString("G17", CultureInfo.InvariantCulture);
// Trim redundant zeros in exponentless decimals
if (sNum.Contains('.') && !sNum.Contains("e") && !sNum.Contains("E"))
{
sNum = sNum.TrimEnd('0').TrimEnd('.');
}
sb.Append(sNum);
}
else
{
// bool / null
if (val.TryGetValue<bool>(out var b))
sb.Append(b ? "true" : "false");
else
sb.Append("null");
}
break;
default:
sb.Append("null");
break;
}
}
private static void WriteString(string s, StringBuilder sb)
{
sb.Append('"');
foreach (var ch in s)
{
switch (ch)
{
case '\"': sb.Append("\\\""); break;
case '\\': sb.Append("\\\\"); break;
case '\b': sb.Append("\\b"); break;
case '\f': sb.Append("\\f"); break;
case '\n': sb.Append("\\n"); break;
case '\r': sb.Append("\\r"); break;
case '\t': sb.Append("\\t"); break;
default:
if (char.IsControl(ch))
{
sb.Append("\\u");
sb.Append(((int)ch).ToString("x4"));
}
else
{
sb.Append(ch);
}
break;
}
}
sb.Append('"');
}
}
```
**Usage in your code (e.g., StellaOps):**
```csharp
var payload = new {
graphId = "core-vuln-edges",
version = 3,
edges = new[]{ new { from = "pkg:nuget/Newtonsoft.Json@13.0.3", to = "pkg:nuget/System.Text.Json@8.0.4" } },
meta = new { generatedAt = DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ssZ") }
};
// Canonical bytes (UTF-8 + LF) for storage/attestation:
var canon = CanonJson.ToCanonicalUtf8(payload);
// Stable revision id (SHA-256 hex):
var graphRevisionId = CanonJson.ComputeGraphRevisionId(payload);
Console.WriteLine(graphRevisionId);
```
---
### Operational tips
* **Freeze locales:** Always run with `CultureInfo.InvariantCulture` when formatting numbers/dates before they hit JSON.
* **Reject nonfinite numbers:** Dont allow `NaN`/`Infinity`—theyre not valid JSON and will break canonicalization.
* **One writer, everywhere:** Use this same helper in CI, build agents, and runtime so the hash never drifts.
* **Record the scheme:** Store the **canonicalization version** (e.g., `canon_v="JCSlike v1"`) alongside the hash to allow future upgrades without breaking verification.
If you want, I can adapt this to stream very large JSONs (avoid `JsonNode`) or emit a **DSSE**/intoto style envelope with the canonical bytes as the payload for your attestation chain.
Heres a concrete, stepbystep implementation plan you can hand to the devs so they know exactly what to build and how it all fits together.
Ill break it into phases:
1. **Design & scope**
2. **Canonical JSON library**
3. **Graph canonicalization & `graph_revision_id` calculation**
4. **Tooling, tests & crossplatform verification**
5. **Integration & rollout**
---
## 1. Design & scope
### 1.1. Goals
* Produce a **stable, crossplatform hash** (e.g. SHA256) from JSON content.
* This hash becomes your **`graph_revision_id`** for supplychain graphs.
* Hash **must not change** due to:
* OS differences (Windows/Linux/macOS)
* Locale differences
* Whitespace/property order differences
* Unicode normalization issues (e.g. accented chars)
### 1.2. Canonicalization strategy (what devs should implement)
Youll use **two levels of canonicalization**:
1. **Domain-level canonicalization (graph)**
Make sure semantically equivalent graphs always serialize to the same inmemory structure:
* Sort arrays (e.g. nodes, edges) in a deterministic way (ID, then type, etc.).
* Remove / ignore non-semantic or unstable fields (timestamps, debug info, transient IDs).
2. **Encoding-level canonicalization (JSON)**
Convert that normalized object into **canonical JSON**:
* Object keys sorted lexicographically (`StringComparer.Ordinal`).
* Strings normalized to **Unicode NFC**.
* Numbers formatted with **InvariantCulture**, no locale effects.
* No NaN/Infinity (reject or map them before hashing).
* UTF8 output with **LF (`\n`) only**.
You already have a C# canonical JSON helper from me; this plan is about turning it into a production-ready component and wiring it through the system.
---
## 2. Canonical JSON library
**Owner:** backend platform team
**Deliverable:** `StellaOps.CanonicalJson` (or similar) shared library
### 2.1. Project setup
* Create a **.NET class library**:
* `src/StellaOps.CanonicalJson/StellaOps.CanonicalJson.csproj`
* Target same framework as your services (e.g. `net8.0`).
* Add reference to `System.Text.Json`.
### 2.2. Public API design
In `CanonicalJson.cs` (or `CanonJson.cs`):
```csharp
namespace StellaOps.CanonicalJson;
public static class CanonJson
{
// Version of your canonicalization algorithm (important for future changes)
public const string CanonicalizationVersion = "canon-json-v1";
public static byte[] ToCanonicalUtf8<T>(T value);
public static string ToCanonicalString<T>(T value);
public static byte[] ComputeSha256<T>(T value);
public static string ComputeSha256Hex<T>(T value);
}
```
**Behavioral requirements:**
* `ToCanonicalUtf8`:
* Serializes input to a `JsonNode`.
* Applies canonicalization rules (sort keys, normalize strings, normalize numbers).
* Writes minimal JSON with:
* No extra spaces.
* Keys in lexicographic order.
* UTF8 bytes and LF newlines only.
* `ComputeSha256Hex`:
* Uses `ToCanonicalUtf8` and computes SHA256.
* Returns lowercase hex string.
### 2.3. Canonicalization rules (dev checklist)
**Objects (`JsonObject`):**
* Sort keys using `StringComparer.Ordinal`.
* Recursively canonicalize child nodes.
**Arrays (`JsonArray`):**
* Preserve order as given by caller.
*(The “graph canonicalization” step will make sure this order is semantically stable before JSON.)*
**Strings:**
* Normalize to **NFC**:
```csharp
var normalized = original.Normalize(NormalizationForm.FormC);
```
* When writing JSON:
* Escape `"`, `\`, control characters (`< 0x20`) using `\uXXXX` format.
* Use `\n`, `\r`, `\t`, `\b`, `\f` for standard escapes.
**Numbers:**
* Support at least `long`, `double`, `decimal`.
* Use **InvariantCulture**:
```csharp
someNumber.ToString("G17", CultureInfo.InvariantCulture);
```
* Normalize `-0` to `0`.
* No grouping separators, no locale decimals.
* Reject `NaN`, `+Infinity`, `-Infinity` with a clear exception.
**Booleans & null:**
* Emit `true`, `false`, `null` (lowercase).
**Newlines:**
* Ensure final string has only `\n`:
```csharp
json = json.Replace("\r\n", "\n").Replace("\r", "\n");
```
### 2.4. Error handling & logging
* Throw a **custom exception** for unsupported content:
* `CanonicalJsonException : Exception`.
* Example triggers:
* Nonfinite numbers (NaN/Infinity).
* Types that cant be represented in JSON.
* Log the path to the field where canonicalization failed (for debugging).
---
## 3. Graph canonicalization & `graph_revision_id`
This is where the library gets used and where the semantics of the graph are defined.
**Owner:** team that owns your supplychain graph model / graph ingestion.
**Deliverables:**
* Domain-specific canonicalization for graphs.
* Stable `graph_revision_id` computation integrated into services.
### 3.1. Define what goes into the hash
Create a short **spec document** (internal) that answers:
1. **What object is being hashed?**
* For example:
```json
{
"graphId": "core-vuln-edges",
"schemaVersion": "3",
"nodes": [...],
"edges": [...],
"metadata": {
"source": "scanner-x",
"epoch": 1732730885
}
}
```
2. **Which fields are included vs excluded?**
* Include:
* Graph identity (ID, schema version).
* Nodes (with stable key set).
* Edges (with stable key set).
* Exclude or **normalize**:
* Raw timestamps of ingestion.
* Non-deterministic IDs (if theyre not part of graph semantics).
* Any environmentspecific details.
3. **Versioning:**
* Add:
* `canonicalizationVersion` (from `CanonJson.CanonicalizationVersion`).
* `graphHashSchemaVersion` (separate from graph schema version).
Example JSON passed into `CanonJson`:
```json
{
"graphId": "...",
"graphSchemaVersion": "3",
"graphHashSchemaVersion": "1",
"canonicalizationVersion": "canon-json-v1",
"nodes": [...],
"edges": [...]
}
```
### 3.2. Domain-level canonicalizer
Create a class like `GraphCanonicalizer` in your graph domain assembly:
```csharp
public interface IGraphCanonicalizer<TGraph>
{
object ToCanonicalGraphObject(TGraph graph);
}
```
Implementation tasks:
1. **Choose a deterministic ordering for arrays:**
* Nodes: sort by `(nodeType, nodeId)` or `(packageUrl, version)`.
* Edges: sort by `(from, to, edgeType)`.
2. **Strip / transform unstable fields:**
* Example: external IDs that may change but are not semantically relevant.
* Replace `DateTime` with a normalized string format (if it must be part of the semantics).
3. **Output DTOs with primitive types only:**
* Create DTOs like:
```csharp
public sealed record CanonicalNode(
string Id,
string Type,
string Name,
string? Version,
IReadOnlyDictionary<string, string>? Attributes
);
```
* Use simple `record` types / POCOs that serialize cleanly with `System.Text.Json`.
4. **Combine into a single canonical graph object:**
```csharp
public sealed record CanonicalGraphDto(
string GraphId,
string GraphSchemaVersion,
string GraphHashSchemaVersion,
string CanonicalizationVersion,
IReadOnlyList<CanonicalNode> Nodes,
IReadOnlyList<CanonicalEdge> Edges
);
```
`ToCanonicalGraphObject` returns `CanonicalGraphDto`.
### 3.3. `graph_revision_id` calculator
Add a service:
```csharp
public interface IGraphRevisionCalculator<TGraph>
{
string CalculateRevisionId(TGraph graph);
}
public sealed class GraphRevisionCalculator<TGraph> : IGraphRevisionCalculator<TGraph>
{
private readonly IGraphCanonicalizer<TGraph> _canonicalizer;
public GraphRevisionCalculator(IGraphCanonicalizer<TGraph> canonicalizer)
{
_canonicalizer = canonicalizer;
}
public string CalculateRevisionId(TGraph graph)
{
var canonical = _canonicalizer.ToCanonicalGraphObject(graph);
return CanonJson.ComputeSha256Hex(canonical);
}
}
```
**Wire this up in DI** for all services that handle graph creation/update.
### 3.4. Persistence & APIs
1. **Database schema:**
* Add a `graph_revision_id` column (string, length 64) to graph tables/collections.
* Optionally add `graph_hash_schema_version` and `canonicalization_version` columns for debugging.
2. **Write path:**
* On graph creation/update:
* Build the domain model.
* Use `GraphRevisionCalculator` to get `graph_revision_id`.
* Store it alongside the graph.
3. **Read path & APIs:**
* Ensure all relevant APIs return `graph_revision_id` for clients.
* If you use it in attestation / DSSE payloads, include it there too.
---
## 4. Tooling, tests & crossplatform verification
This is where you make sure it **actually behaves identically** on all platforms and input variations.
### 4.1. Unit tests for `CanonJson`
Create a dedicated test project: `tests/StellaOps.CanonicalJson.Tests`.
**Test categories & examples:**
1. **Property ordering:**
* Input 1: `{"b":1,"a":2}`
* Input 2: `{"a":2,"b":1}`
* Assert: `ToCanonicalString` is identical + same hash.
2. **Whitespace variations:**
* Input with lots of spaces/newlines vs compact.
* Canonical outputs must match.
3. **Unicode normalization:**
* One string using precomposed characters.
* Same text using combining characters.
* Canonical output must match (NFC).
4. **Number formatting:**
* `1`, `1.0`, `1.0000000000` → must canonicalize to the same representation.
* `-0.0` → canonicalizes to `0`.
5. **Booleans & null:**
* Check exact lowercase output: `true`, `false`, `null`.
6. **Error behaviors:**
* Try serializing `double.NaN` → expect `CanonicalJsonException`.
### 4.2. Integration tests for graph hashing
Create tests in graph service test project:
1. Build two graphs that are **semantically identical** but:
* Nodes/edges inserted in different order.
* Fields ordered differently.
* Different whitespace in strings (if your app might introduce such).
2. Assert:
* `CalculateRevisionId` yields the same result.
* Canonical DTOs match expected snapshots (optional snapshot tests).
3. Build graphs that differ in a meaningful way (e.g., extra edge).
* Assert that `graph_revision_id` is different.
### 4.3. Crossplatform smoke tests
**Goal:** Prove same hash on Windows, Linux and macOS.
Implementation idea:
1. Add a small console tool: `StellaOps.CanonicalJson.Tool`:
* Usage:
`stella-canon hash graph.json`
* Prints:
* Canonical JSON (optional flag).
* SHA256 hex.
2. In CI:
* Run the same test JSON on:
* Windows runner.
* Linux runner.
* Assert hashes are equal (store expected in a test harness or artifact).
---
## 5. Integration into your pipelines & rollout
### 5.1. Where to compute `graph_revision_id`
Decide (and document) **one place** where the ID is authoritative, for example:
* After ingestion + normalization step, **before** persisting to your graph store.
* Or in a dedicated “graph revision service” used by ingestion pipelines.
Implementation:
* Update the ingestion service:
1. Parse incoming data into internal graph model.
2. Apply domain canonicalizer → `CanonicalGraphDto`.
3. Use `GraphRevisionCalculator` → `graph_revision_id`.
4. Persist graph + revision ID.
### 5.2. Migration / backfill plan
If you already have graphs in production:
1. Add new columns/fields for `graph_revision_id` (nullable).
2. Write a migration job:
* Fetch existing graph.
* Canonicalize + hash.
* Store `graph_revision_id`.
3. For a transition period:
* Accept both “old” and “new” graphs.
* Use `graph_revision_id` where available; fall back to legacy IDs when necessary.
4. After backfill is complete:
* Make `graph_revision_id` mandatory for new graphs.
* Phase out any legacy revision logic.
### 5.3. Feature flag & safety
* Gate the use of `graph_revision_id` in highrisk flows (e.g., attestations, policy decisions) behind a **feature flag**:
* `graphRevisionIdEnabled`.
* Roll out gradually:
* Start in staging.
* Then a subset of production tenants.
* Monitor for:
* Unexpected changes in revision IDs on unchanged graphs.
* Errors from `CanonicalJsonException`.
---
## 6. Documentation for developers & operators
Have a short internal doc (or page) with:
1. **Canonical JSON spec summary:**
* Sorting rules.
* Unicode NFC requirement.
* Number format rules.
* Nonfinite numbers not allowed.
2. **Graph hashing spec:**
* Fields included in the hash.
* Fields explicitly ignored.
* Array ordering rules for nodes/edges.
* Current:
* `graphHashSchemaVersion = "1"`
* `CanonicalizationVersion = "canon-json-v1"`
3. **Examples:**
* Sample graph JSON input.
* Canonical JSON output.
* Expected SHA256.
4. **Operational guidance:**
* How to run the CLI tool to debug:
* “Why did this graph get a new `graph_revision_id`?”
* What to do on canonicalization errors (usually indicates bad data).
---
If youd like, next step I can do is: draft the **actual C# projects and folder structure** (with file names + stub code) so your team can just copy/paste the skeleton into the repo and start filling in the domain-specific bits.

View File

@@ -0,0 +1,775 @@
Heres a crisp, practical idea to harden StellaOps: make the SBOM → VEX pipeline **deterministic and verifiable** by treating it as a series of signed, hashanchored state transitions—so every rebuild yields the *same* provenance envelope you can mathematically check across airgapped nodes.
---
### What this means (plain English)
* **SBOM** (whats inside): list of packages, files, and their hashes.
* **VEX** (whats affected): statements like “CVE20241234 is **not** exploitable here because X.”
* **Deterministic**: same inputs → byteidentical outputs, every time.
* **Verifiable transitions**: each step (ingest → normalize → resolve → reachability → VEX) emits a signed attestation that pins its inputs/outputs by content hash.
---
### Minimal design you can drop into StellaOps
1. **Canonicalize everything**
* Sort JSON keys, normalize whitespace/line endings.
* Freeze timestamps by recording them only in an outer envelope (not inside payloads used for hashing).
2. **Edgelevel attestations**
* For each dependency edge in the reachability graph `(nodeA → nodeB via symbol S)`, emit a tiny DSSE payload:
* `{edge_id, from_purl, to_purl, rule_id, witness_hashes[]}`
* Hash is over the canonical payload; sign via DSSE (Sigstore or your Authority PKI).
3. **Step attestations (pipeline states)**
* For each stage (`Sbomer`, `Scanner`, `Vexer/Excititor`, `Concelier`):
* Emit `predicateType`: `stellaops.dev/attestations/<stage>`
* Include `input_digests[]`, `output_digests[]`, `parameters_digest`, `tool_version`
* Sign with stage key; record the public key (or cert chain) in Authority.
4. **Provenance envelope**
* Build a toplevel DSSE that includes:
* Merkle root of **all** edge attestations.
* Merkle roots of each stages outputs.
* Mapping table of `PURL ↔ buildID (ELF/PE/MachO)` for stable identity.
5. **Replay manifest**
* A single, declarative file that pins:
* Feeds (CPE/CVE/VEX sources + exact digests)
* Rule/lattice versions and parameters
* Container images + layers SHA256
* Platform toggles (e.g., PQC on/off)
* Running **replay** on this manifest must reproduce the same Merkle roots.
6. **Airgap sync**
* Export only the envelopes + Merkle roots + public certs.
* On the target, verify chains and recompute roots from the replay manifest—no internet required.
---
### Slim C# shapes (DTOs) for DSSE predicates
```csharp
public record EdgeAttestation(
string EdgeId,
string FromPurl,
string ToPurl,
string RuleId,
string[] WitnessHashes, // e.g., CFG slice, symbol tables, lineage JSON
string CanonicalAlgo = "SHA256");
public record StepAttestation(
string Stage, // "Sbomer" | "Scanner" | "Excititor" | "Concelier"
string ToolVersion,
string[] InputDigests,
string[] OutputDigests,
string ParametersDigest, // hash of canonicalized params
DateTimeOffset StartedAt,
DateTimeOffset FinishedAt);
public record ProvenanceEnvelope(
string ReplayManifestDigest,
string EdgeMerkleRoot,
Dictionary<string,string> StageMerkleRoots, // stage -> root
Dictionary<string,string> PurlToBuildId); // stable identity map
```
---
### Determinism checklist (quick win)
* Canonical JSON (stable key order) everywhere.
* No wallclock timestamps inside hashed payloads.
* Only reference inputs by digest, never by URL.
* Lock rule sets / lattice policies by digest.
* Normalize file paths (POSIX style) and line endings.
* Container images by **digest**, not tags.
---
### Why its worth it
* **Auditability:** every VEX claim is backed by a verifiable graph path with signed edges.
* **Reproducibility:** regulators (and customers) can replay your exact scan and get identical roots.
* **Integrity at scale:** airgapped sites can validate without trusting your network—just the math.
If you want, Ill turn this into readytopaste `.proto` contracts + a small .NET library (`StellaOps.Attestations`) with DSSE signing/verification helpers and Merkle builders.
Got it — lets turn that sketch into a concrete implementation plan your devs can actually execute.
Ill structure this as:
1. **Objectives & scope**
2. **High-level architecture**
3. **Workstreams & milestones**
4. **Detailed tasks per workstream**
5. **Rollout, testing, and ops**
You can copy/paste this straight into a tracking system and break it into tickets.
---
## 1. Objectives & scope
### Primary objectives
* Make the **SBOM → VEX pipeline deterministic**:
* Same inputs (SBOM, feeds, rules, images) → **bitidentical** provenance & VEX outputs.
* Make the pipeline **verifiable**:
* Each step emits **signed attestations** with content hashes.
* Attestations are **chainable** from raw SBOM to VEX & reports.
* Make outputs **replayable** and **airgap friendly**:
* A single **Replay Manifest** can reconstruct pipeline outputs on another node and verify Merkle roots match.
### Out of scope (for this phase)
* New vulnerability scanning engines.
* New UI views (beyond minimal “show provenance / verify”).
* Key management redesign (well integrate with existing Authority / PKI).
---
## 2. High-level architecture
### New shared library
**Library name (example):** `StellaOps.Attestations` (or similar)
Provides:
* Canonical serialization:
* Deterministic JSON encoder (stable key ordering, normalized formatting).
* Hashing utilities:
* SHA256 (and extension point for future algorithms).
* DSSE wrapper:
* `Sign(payload, keyRef)` → DSSE envelope.
* `Verify(dsse, keyResolver)` → payload + key metadata.
* Merkle utilities:
* Build Merkle trees from lists of digests.
* DTOs:
* `EdgeAttestation`, `StepAttestation`, `ProvenanceEnvelope`, `ReplayManifest`.
### Components that will integrate the library
* **Sbomer** outputs SBOM + StepAttestation.
* **Scanner** consumes SBOM, produces findings + StepAttestation.
* **Excititor / Vexer** takes findings + reachability graph → VEX + EdgeAttestations + StepAttestation.
* **Concelier** takes SBOM + VEX → reports + StepAttestation + ProvenanceEnvelope.
* **Authority** manages keys and verification (possibly separate microservice or shared module).
---
## 3. Workstreams & milestones
Break this into parallel workstreams:
1. **WS1 Canonicalization & hashing**
2. **WS2 DSSE & key integration**
3. **WS3 Attestation schemas & Merkle envelopes**
4. **WS4 Pipeline integration (Sbomer, Scanner, Excititor, Concelier)**
5. **WS5 Replay engine & CLI**
6. **WS6 Verification / airgap support**
7. **WS7 Testing, observability, and rollout**
Each workstream below has concrete tasks + “Definition of Done” (DoD).
---
## 4. Detailed tasks per workstream
### WS1 Canonicalization & hashing
**Goal:** A small, well-tested core that makes everything deterministic.
#### Tasks
1. **Define canonical JSON format**
* Decision doc:
* Use UTF8.
* No insignificant whitespace.
* Keys always sorted lexicographically.
* No embedded timestamps or non-deterministic fields inside hashed payloads.
* Implement:
* `CanonicalJsonSerializer.Serialize<T>(T value) : string/byte[]`.
2. **Define deterministic string normalization rules**
* Normalize line endings in any text: `\n` only.
* Normalize paths:
* Use POSIX style `/`.
* Remove trailing slashes (except root).
* Normalize numeric formatting:
* No scientific notation.
* Fixed decimal rules, if relevant.
3. **Implement hashing helper**
* `Digest` type:
```csharp
public record Digest(string Algorithm, string Value); // Algorithm = "SHA256"
```
* `Hashing.ComputeDigest(byte[] data) : Digest`.
* `Hashing.ComputeDigestCanonical<T>(T value) : Digest` (serialize canonically then hash).
4. **Add unit tests & golden files**
* Golden tests:
* Same input object → same canonical JSON & digest, regardless of property order, culture, runtime.
* Hash of JSON must match precomputed values (store `.golden` files in repo).
* Edge cases:
* Unicode strings.
* Nested objects.
* Arrays with different order (order preserved, but ensure same input → same output).
#### DoD
* Canonical serializer & hashing utilities available in `StellaOps.Attestations`.
* Test suite with >95% coverage for serializer + hashing.
* Simple CLI or test harness:
* `stella-attest dump-canonical <json>` → prints canonical JSON & digest.
---
### WS2 DSSE & key integration
**Goal:** Standardize how we sign and verify attestations.
#### Tasks
1. **Select DSSE representation**
* Use JSON DSSE envelope:
```json
{
"payloadType": "stellaops.dev/attestation/edge@v1",
"payload": "<base64 of canonical JSON>",
"signatures": [{ "keyid": "...", "sig": "..." }]
}
```
2. **Implement DSSE API in library**
* Interfaces:
```csharp
public interface ISigner {
Task<Signature> SignAsync(byte[] payload, string keyRef);
}
public interface IVerifier {
Task<VerificationResult> VerifyAsync(Envelope envelope);
}
```
* Helpers:
* `Dsse.CreateEnvelope(payloadType, canonicalPayloadBytes, signer, keyRef)`.
* `Dsse.VerifyEnvelope(envelope, verifier)`.
3. **Integrate with Authority / PKI**
* Add `AuthoritySigner` / `AuthorityVerifier` implementations:
* `keyRef` is an ID understood by Authority (service name, stage name, or explicit key ID).
* Ensure we can:
* Request signing of arbitrary bytes.
* Resolve the public key used to sign.
4. **Key usage conventions**
* Define mapping:
* `sbomer` key.
* `scanner` key.
* `excititor` key.
* `concelier` key.
* Optional: use distinct keys per environment (dev/stage/prod) but **include environment** in attestation metadata.
5. **Tests**
* Round-trip: sign then verify sample payloads.
* Negative tests:
* Tampered payload → verification fails.
* Tampered signatures → verification fails.
#### DoD
* DSSE envelope creation/verification implemented and tested.
* Authority integration with mock/fake for unit tests.
* Documentation for developers:
* “How to emit an attestation: 5line example.”
---
### WS3 Attestation schemas & Merkle envelopes
**Goal:** Standardize the data models for all attestations and envelopes.
#### Tasks
1. **Define EdgeAttestation schema**
Fields (concrete draft):
```csharp
public record EdgeAttestation(
string EdgeId, // deterministic ID
string FromPurl, // e.g. pkg:maven/...
string ToPurl,
string? FromSymbol, // optional (symbol, API, entry point)
string? ToSymbol,
string RuleId, // which reachability rule fired
Digest[] WitnessDigests, // digests of evidence payloads
string CanonicalAlgo = "SHA256"
);
```
* `EdgeId` convention (document in ADR):
* E.g. `sha256(fromPurl + "→" + toPurl + "|" + ruleId + "|" + fromSymbol + "|" + toSymbol)` (before hashing, canonicalize strings).
2. **Define StepAttestation schema**
```csharp
public record StepAttestation(
string Stage, // "Sbomer" | "Scanner" | ...
string ToolVersion,
Digest[] InputDigests, // SBOM digest, feed digests, image digests
Digest[] OutputDigests, // outputs of this stage
Digest ParametersDigest, // hash of canonicalized params (flags, rule sets, etc.)
DateTimeOffset StartedAt,
DateTimeOffset FinishedAt,
string Environment, // dev/stage/prod/airgap
string NodeId // machine or logical node name
);
```
* Note: `StartedAt` / `FinishedAt` are **not** included in any hashed payload used for determinism; theyre OK as metadata but not part of Merkle roots.
3. **Define ProvenanceEnvelope schema**
```csharp
public record ProvenanceEnvelope(
Digest ReplayManifestDigest,
Digest EdgeMerkleRoot,
Dictionary<string, Digest> StageMerkleRoots, // stage -> root digest
Dictionary<string, string> PurlToBuildId // PURL -> build-id string
);
```
4. **Define ReplayManifest schema**
```csharp
public record ReplayManifest(
string PipelineVersion,
Digest SbomDigest,
Digest[] FeedDigests, // CVE, CPE, VEX sources
Digest[] RuleSetDigests, // reachability + policy rules
Digest[] ContainerImageDigests,
string[] PlatformToggles // e.g. ["pqc=on", "mode=strict"]
);
```
5. **Implement Merkle utilities**
* Provide:
* `Digest Merkle.BuildRoot(IEnumerable<Digest> leaves)`.
* Deterministic rules:
* Sort leaves by `Value` (digest hex string) before building.
* If odd number of leaves, duplicate last leaf or define explicit strategy and document it.
* Tie into:
* Edges → `EdgeMerkleRoot`.
* Per stage attestation list → stagespecific root.
6. **Schema documentation**
* Markdown/ADR file:
* Field definitions.
* Which fields are hashed vs. metadata only.
* How `EdgeId`, Merkle roots, and PURL→BuildId mapping are generated.
#### DoD
* DTOs implemented in shared library.
* Merkle root builder implemented and tested.
* Schema documented and shared across teams.
---
### WS4 Pipeline integration
**Goal:** Each stage emits StepAttestations and (for reachability) EdgeAttestations, and Concelier emits ProvenanceEnvelope.
Well do this stage by stage.
#### WS4.A Sbomer integration
**Tasks**
1. Identify **SBOM hash**:
* After generating SBOM, serialize canonically and compute `Digest`.
2. Collect **inputs**:
* Input sources digests (e.g., image digests, source artifact digests).
3. Collect **parameters**:
* All relevant configuration into a `SbomerParams` object:
* E.g. `scanDepth`, `excludedPaths`, `sbomFormat`.
* Canonicalize and compute `ParametersDigest`.
4. Emit **StepAttestation**:
* Create DTO.
* Canonicalize & hash for Merkle tree use.
* Wrap in DSSE envelope with `payloadType = "stellaops.dev/attestation/step@v1"`.
* Store envelope:
* Append to standard location (e.g. `<artifact-root>/attestations/sbomer-step.dsse.json`).
5. Add config flag:
* `--emit-attestations` (default: off initially, later: on by default).
#### WS4.B Scanner integration
**Tasks**
1. Take SBOM digest as an **InputDigest**.
2. Collect feed digests:
* Each CVE/CPE/VEX feed file → canonical hash.
3. Compute `ScannerParams` digest:
* E.g. `severityThreshold`, `downloaderOptions`, `scanMode`.
4. Emit **StepAttestation** (same pattern as Sbomer).
5. Tag scanner outputs:
* The vulnerability findings file(s) should be contentaddressable (filename includes digest or store meta manifest mapping).
#### WS4.C Excititor/Vexer integration
**Tasks**
1. Integrate reachability graph emission:
* From final graph, **generate EdgeAttestations**:
* One per edge `(from, to, rule)`.
* For each edge, compute witness digests:
* E.g. serialized CFG slice, symbol table snippet, call chain.
* Those witness artifacts should be stored under canonical paths:
* `<artifact-root>/witnesses/<edge-id>/<witness-type>.json`.
2. Canonicalize & hash each EdgeAttestation.
3. Build **Merkle root** over all edge attestation digests.
4. Emit **Excititor StepAttestation**:
* Inputs: SBOM, scanner findings, feeds, rule sets.
* Outputs: VEX document(s), EdgeMerkleRoot digest.
* Params: reachability flags, rule definitions digest.
5. Store:
* Edge attestations:
* Either:
* One DSSE per edge (possibly a lot of files).
* Or a **batch file** containing a list of attestations wrapped into a single DSSE.
* Prefer: **batch** for performance; define `EdgeAttestationBatch` DTO.
* VEX output(s) with deterministic file naming.
#### WS4.D Concelier integration
**Tasks**
1. Gather all **StepAttestations** & **EdgeMerkleRoot**:
* Input: references (paths) to stage outputs + their DSSE envelopes.
2. Build `PurlToBuildId` map:
* For each component:
* Extract PURL from SBOM.
* Extract build-id from binary metadata.
3. Build **StageMerkleRoots**:
* For each stage, compute Merkle root of its StepAttestations.
* In simplest version: 1 step attestation per stage → root is just its digest.
4. Construct **ReplayManifest**:
* From final pipeline context (SBOM, feeds, rules, images, toggles).
* Compute `ReplayManifestDigest` and store manifest file (e.g. `replay-manifest.json`).
5. Construct **ProvenanceEnvelope**:
* Fill fields with digests.
* Canonicalize and sign with Concelier key (DSSE).
6. Store outputs:
* `provenance-envelope.dsse.json`.
* `replay-manifest.json` (unsigned) + optional signed manifest.
#### WS4 DoD
* All four stages can:
* Emit StepAttestations (and EdgeAttestations where applicable).
* Produce a final ProvenanceEnvelope.
* Feature can be toggled via config.
* Pipelines run endtoend in CI with attestation emission enabled.
---
### WS5 Replay engine & CLI
**Goal:** Given a ReplayManifest, rerun the pipeline and verify that all Merkle roots and digests match.
#### Tasks
1. Implement a **Replay Orchestrator** library:
* Input:
* Path/URL to `replay-manifest.json`.
* Responsibilities:
* Verify manifests own digest (if signed).
* Fetch or confirm presence of:
* SBOM.
* Feeds.
* Rule sets.
* Container images.
* Spin up each stage with parameters reconstructed from the manifest:
* Ensure versions and flags match.
* Implementation: shared orchestration code reusing existing pipeline entrypoints.
2. Implement **CLI tool**: `stella-attest replay`
* Commands:
* `stella-attest replay run --manifest <path> --out <dir>`.
* Runs pipeline and emits fresh attestations.
* `stella-attest replay verify --manifest <path> --envelope <path> --attest-dir <dir>`:
* Compares:
* Replay Merkle roots vs. `ProvenanceEnvelope`.
* Stage roots.
* Edge root.
* Emits a verification report (JSON + human-readable).
3. Verification logic:
* Steps:
1. Parse ProvenanceEnvelope (verify DSSE signature).
2. Compute Merkle roots from the new replays attestations.
3. Compare:
* `ReplayManifestDigest` in envelope vs digest of manifest used.
* `EdgeMerkleRoot` vs recalculated root.
* `StageMerkleRoots[stage]` vs recalculated stage roots.
4. Output:
* `verified = true/false`.
* If false, list mismatches with digests.
4. Tests:
* Replay the same pipeline on same machine → must match.
* Replay on different machine (CI job simulating different environment) → must match.
* Injected change in feed or rule set → deliberate mismatch detected.
#### DoD
* `stella-attest replay` works locally and in CI.
* Documentation: “How to replay a run and verify determinism.”
---
### WS6 Verification / airgap support
**Goal:** Allow verification in environments without outward network access.
#### Tasks
1. **Define export bundle format**
* Bundle includes:
* `provenance-envelope.dsse.json`.
* `replay-manifest.json`.
* All DSSE attestation files.
* All witness artifacts (or digests only if storage is local).
* Public key material or certificate chains needed to verify signatures.
* Represent as:
* Tarball or zip: e.g. `stella-bundle-<pipeline-id>.tar.gz`.
* Manifest file listing contents and digests.
2. **Implement exporter**
* CLI: `stella-attest export --run-id <id> --out bundle.tar.gz`.
* Internally:
* Collect paths to all relevant artifacts for the run.
* Canonicalize folder structure (e.g. `/sbom`, `/scanner`, `/vex`, `/attestations`, `/witnesses`).
3. **Implement offline verifier**
* CLI: `stella-attest verify-bundle --bundle <path>`.
* Steps:
* Unpack bundle to temp dir.
* Verify:
* Attestation signatures via included public keys.
* Merkle roots and digests as in WS5.
* Do **not** attempt network calls.
4. **Documentation / runbook**
* “How to verify a Stella Ops run in an airgapped environment.”
* Include:
* How to move bundles (e.g. via USB, secure file transfer).
* What to do if verification fails.
#### DoD
* Bundles can be exported from a connected environment and verified in a disconnected environment using only the bundle contents.
---
### WS7 Testing, observability, and rollout
**Goal:** Make this robust, observable, and gradually enable in prod.
#### Tasks
1. **Integration tests**
* Full pipeline scenario:
* Start from known SBOM + feeds + rules.
* Run pipeline twice and:
* Compare final outputs: `ProvenanceEnvelope`, VEX doc, final reports.
* Compare digests & Merkle roots.
* Edge cases:
* Different machines (simulate via CI jobs with different runners).
* Missing or corrupted attestation file → verify that verification fails with clear error.
2. **Property-based tests** (optional but great)
* Generate random but structured SBOMs and graphs.
* Ensure:
* Canonicalization is idempotent.
* Hashing is consistent.
* Merkle roots are stable for repeated runs.
3. **Observability**
* Add logging around:
* Attestation creation & signing.
* Verification failures.
* Replay runs.
* Add metrics:
* Number of attestations per run.
* Time spent in canonicalization / hashing / signing.
* Verification success/fail counts.
4. **Rollout plan**
1. **Phase 0 (dev only)**:
* Attestation emission enabled by default in dev.
* Verification run in CI only.
2. **Phase 1 (staging)**:
* Enable dualpath:
* Old behaviour + new attestations.
* Run replay+verify in staging pipeline.
3. **Phase 2 (production, nonenforced)**:
* Enable attestation emission in prod.
* Verification runs “sidecar” but does not block.
4. **Phase 3 (production, enforced)**:
* CI/CD gates:
* Fails if:
* Signatures invalid.
* Merkle roots mismatch.
* Envelope/manifest missing.
5. **Documentation**
* Developer docs:
* “How to emit a StepAttestation from your service.”
* “How to add new fields without breaking determinism.”
* Operator docs:
* “How to run replay & verification.”
* “How to interpret failures and debug.”
#### DoD
* All new functionality covered by automated tests.
* Observability dashboards / alerts configured.
* Rollout phases defined with clear criteria for moving to the next phase.
---
## 5. How to turn this into tickets
You can break this down roughly like:
* **Epic 1:** Attestation core library (WS1 + WS2 + WS3).
* **Epic 2:** Stage integrations (WS4AD).
* **Epic 3:** Replay & verification tooling (WS5 + WS6).
* **Epic 4:** Testing, observability, rollout (WS7).
If you want, next step I can:
* Turn each epic into **Jira-style stories** with acceptance criteria.
* Or produce **sample code stubs** (interfaces + minimal implementations) matching this plan.

View File

@@ -0,0 +1,684 @@
Im sharing this because it closely aligns with your strategy for building strong supplychain and attestation moats — these are emerging standards youll want to embed into your architecture now.
![Image](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeZGvwJpM4Ey4CvebNDXI3qKZwYnSbHsKRjPH_z4qZyf6ibWZhFAGCWGbPhY4uZ5qW3fcmiKra7T6VfhfpTWqy4huJ-8SGNlN-SybGvSRqfz-TmOjtkwC0JVev1xPTPC_nRabAV?key=SOEvwUJlX_jC0gvOXKn1JKnR)
![Image](https://scribesecurity.com/wp-content/uploads/2023/04/Attestations-flow-002-scaled.webp)
![Image](https://chainloop.dev/_astro/646b633855fe78f2da994ff4_attestation_layers.BTf5q4NL.png)
### DSSE + intoto: The eventspine
* The Dead Simple Signing Envelope (DSSE) spec defines a minimal JSON envelope for signing arbitrary data — “transparent transport for signed statements”. ([GitHub][1])
* The intoto Attestation model builds on DSSE as the envelope, with a statement + predicate about the artifact (e.g., build/cohort metadata). ([Legit Security][2])
* In your architecture: using DSSEsigned intoto attestations across Scanner → Sbomer → Vexer → Scorer → Attestor gives you a unified “event spine” of provenance and attestations.
* That means every step emits a signed statement, verifiable, linking tooling. Helps achieve deterministic replayability and auditintegrity.
![Image](https://cyclonedx.org/images/CycloneDX-Social-Card.png?ts=167332841195327)
![Image](https://devsec-blog.com/wp-content/uploads/2024/03/1_vgsHYhpBnkMTrXtnYY9LFA-7.webp)
![Image](https://cyclonedx.org/images/guides/NIST-SP-1800-38B.png)
### CycloneDX v1.7: SBOM + cryptography assurance
* Version 1.7 of CycloneDX was released October 21, 2025 and introduces **advanced cryptography, dataprovenance transparency, and IP visibility** for the software supply chain. ([CycloneDX][3])
* It introduces a “Cryptography Registry” to standardize naming / classification of crypto algorithms in BOMs — relevant for PQC readiness, global cryptographic standards like GOST/SM, etc. ([CycloneDX][4])
* If you emit SBOMs in CycloneDX v1.7 format (and include CBOM/crypto details), youre aligning with modern supplychain trust expectations — satisfying your moat #1 (cryptosovereign readiness) and #2 (deterministic manifests).
![Image](https://miro.medium.com/v2/resize%3Afit%3A1200/1%2Abdz7tUqYTQecioDQarHNcw.png)
![Image](https://alphasec.io/content/images/2022/11/How-sigstore-works.png)
![Image](https://blog.sigstore.dev/images/ga.png)
### Sigstore Rekor v2: Logging the provenance chain
* Rekor v2 reached GA on October102025; the redesign introduces a “tilebacked transparency log implementation” to simplify ops and reduce costs. ([Sigstore Blog][5])
* Rekor supports auditing of signing events, monitors to verify appendonly consistency, and log inclusion proofs. ([Sigstore][6])
* By bundling your provenance/SBOM/VEX/scores and recording those in Rekor v2, youre closing your chain of custody with immutable log entries — supports your “ProofofIntegrity Graph” moat (point #4).
### Why this matters for your architecture
* With each scan or stage (Scanner → Sbomer → Vexer → Scorer → Attestor) producing a DSSEsigned intoto statement, you have a canonical spine of events.
* Emitting SBOMs in CycloneDX v1.7 ensures you not only list components but crypto metadata, attestation pointers, and versions ready for futureproofing.
* Recording all artifacts (attestations, SBOM, VEX, scores) into Rekor v2 gives you external public verifiability and auditability — minimal trust surface, maximal transparency.
* These standards map directly to several of your moats: cryptosovereign readiness, deterministic replayable scans, provenance graphs, trustledger.
If you like, I can pull together **mappings** of your internal modules (Scanner, Sbomer, Vexer, etc) to these standards and provide a **reference implementation skeleton** in .NET10 (you indicated youre working with that).
[1]: https://github.com/secure-systems-lab/dsse?utm_source=chatgpt.com "secure-systems-lab/dsse - Dead Simple Signing Envelope"
[2]: https://www.legitsecurity.com/blog/slsa-provenance-blog-series-part-1-what-is-software-attestation?utm_source=chatgpt.com "SLSA Provenance Blog Series, Part 1: What Is Software ..."
[3]: https://cyclonedx.org/news/cyclonedx-v1.7-released/?utm_source=chatgpt.com "CycloneDX v1.7 Delivers Advanced Cryptography, ..."
[4]: https://cyclonedx.org/registry/cryptography/?utm_source=chatgpt.com "Cryptography Registry"
[5]: https://blog.sigstore.dev/rekor-v2-ga/?utm_source=chatgpt.com "Rekor v2 GA - Cheaper to run, simpler to maintain"
[6]: https://docs.sigstore.dev/logging/overview/?utm_source=chatgpt.com "Rekor"
Got it — lets turn your vision into something devs can actually build against.
Below is a **concrete implementation plan** you can paste into an internal doc / ticketing system and refine into epics & stories.
---
## 0. Assumptions & Target EndState
**Assumptions**
* Services: `Scanner → Sbomer → Vexer → Scorer → Attestor` (plus shared infra).
* Language: .NET (8/10) for your services.
* You want:
* **DSSEsigned intoto attestations** as the event “spine”. ([GitHub][1])
* **CycloneDX 1.7 SBOM + VEX** for inventory + exploitability. ([CycloneDX][2])
* **Rekor v2** as the transparency log, with Sigstore bundles for offline verification. ([Sigstore Blog][3])
**Target picture**
For every artifact *A* (image / binary / model):
1. Each stage emits a **DSSEsigned intoto attestation**:
* Scanner → scan predicate
* Sbomer → CycloneDX 1.7 SBOM predicate
* Vexer → VEX predicate
* Scorer → score predicate
* Attestor → final decision predicate
2. Each attestation is:
* Signed with your keys or Sigstore keyless.
* Logged to Rekor (v2) and optionally packaged into a Sigstore bundle.
3. A consumer can:
* Fetch all attestations for *A*, verify signatures + Rekor proofs, read SBOM/VEX, and understand the score.
The rest of this plan is: **how to get there stepbystep.**
---
## 1. Core Data Contracts (Must Be Done First)
### 1.1 Define the canonical envelope and statement
**Standards to follow**
* **DSSE Envelope** from securesystemslab (`envelope.proto`). ([GitHub][1])
* **Intoto Attestation “Statement”** model (subject + predicateType + predicate). ([SLSA][4])
**Deliverable: internal spec**
Create a short internal spec (Markdown) for developers:
* `ArtifactIdentity`
* `algorithm`: `sha256` | `sha512` | etc.
* `digest`: hex string.
* Optional: `name`, `version`, `buildPipelineId`.
* `InTotoStatement<TPredicate>`
* `type`: fixed: `https://in-toto.io/Statement/v1`
* `subject`: list of `ArtifactIdentity`.
* `predicateType`: string (URL-ish).
* `predicate`: generic JSON (stagespecific payload).
* `DsseEnvelope`
* `payloadType`: e.g. `application/vnd.in-toto+json`
* `payload`: base64 of the JSON `InTotoStatement`.
* `signatures[]`: `{ keyid, sig }`.
### 1.2 Implement the .NET representation
**Tasks**
1. **Generate DSSE envelope types**
* Use `envelope.proto` from DSSE repo and generate C# types; or reuse the Grafeas `Envelope` class which is explicitly aligned with DSSE. ([Google Cloud][5])
* Project: `Attestations.Core`.
2. **Define generic Statement & Predicate types**
In `Attestations.Core`:
```csharp
public record ArtifactIdentity(string Algorithm, string Digest, string? Name = null, string? Version = null);
public record InTotoStatement<TPredicate>(
string _Type,
IReadOnlyList<ArtifactIdentity> Subject,
string PredicateType,
TPredicate Predicate
);
public record DsseSignature(string KeyId, byte[] Sig);
public record DsseEnvelope(
string PayloadType,
byte[] Payload,
IReadOnlyList<DsseSignature> Signatures
);
```
3. **Define predicate contracts for each stage**
Example:
```csharp
public static class PredicateTypes
{
public const string ScanV1 = "https://example.com/attestations/scan/v1";
public const string SbomV1 = "https://example.com/attestations/sbom/cyclonedx-1.7";
public const string VexV1 = "https://example.com/attestations/vex/cyclonedx";
public const string ScoreV1 = "https://example.com/attestations/score/v1";
public const string VerdictV1= "https://example.com/attestations/verdict/v1";
}
```
Then define concrete predicates:
* `ScanPredicateV1`
* `SbomPredicateV1` (likely mostly a pointer to a CycloneDX doc)
* `VexPredicateV1` (pointer to VEX doc + summary)
* `ScorePredicateV1`
* `VerdictPredicateV1` (attest/deny + reasoning)
**Definition of done**
* All services share a single `Attestations.Core` library.
* There is a test that serializes + deserializes `InTotoStatement` and `DsseEnvelope` and matches the JSON format expected by intoto tooling.
---
## 2. Signing & Key Management Layer
### 2.1 Abstraction: decouple from crypto choice
Create an internal package: `Attestations.Signing`.
```csharp
public interface IArtifactSigner
{
Task<DsseEnvelope> SignStatementAsync<TPredicate>(
InTotoStatement<TPredicate> statement,
CancellationToken ct = default);
}
public interface IArtifactVerifier
{
Task VerifyAsync(DsseEnvelope envelope, CancellationToken ct = default);
}
```
Backends to implement:
1. **KMSbacked signer** (e.g., AWS KMS, GCP KMS, Azure Key Vault).
2. **Sigstore keyless / cosign integration**:
* For now you can wrap the **cosign CLI**, which already understands intoto attestations and Rekor. ([Sigstore][6])
* Later, replace with a native HTTP client against Sigstore services.
### 2.2 Key & algorithm strategy
* Default: **ECDSA P256** or **Ed25519** keys, stored in KMS.
* Wrap all usage via `IArtifactSigner`/`IArtifactVerifier`.
* Keep room for **PQC migration** by never letting services call crypto APIs directly; only use the abstraction.
**Definition of done**
* CLI or small test harness that:
* Creates a dummy `InTotoStatement`,
* Signs it via `IArtifactSigner`,
* Verifies via `IArtifactVerifier`,
* Fails verification if payload is tampered.
---
## 3. ServicebyService Integration
For each component well define **inputs → behavior → attestation output**.
### 3.1 Scanner
**Goal**
For each artifact, emit a **scan attestation** with normalized findings.
**Tasks**
1. Extend Scanner to normalize findings to a canonical model:
* Vulnerability id (CVE / GHSA / etc).
* Affected package (`purl`, version).
* Severity, source (NVD, OSV, etc).
2. Define `ScanPredicateV1`:
```csharp
public record ScanPredicateV1(
string ScannerName,
string ScannerVersion,
DateTimeOffset ScanTime,
string ScanConfigurationId,
IReadOnlyList<ScanFinding> Findings
);
```
3. After each scan completes:
* Build `ArtifactIdentity` from the artifact digest.
* Build `InTotoStatement<ScanPredicateV1>` with `PredicateTypes.ScanV1`.
* Call `IArtifactSigner.SignStatementAsync`.
* Save `DsseEnvelope` to an **Attestation Store** (see section 5).
* Publish an event `scan.attestation.created` on your message bus with the attestation id.
**Definition of done**
* Every scan results in a stored DSSE envelope with `ScanV1` predicate.
* A consumer service can query by artifact digest and get all scan attestations.
---
### 3.2 Sbomer (CycloneDX 1.7)
**Goal**
Generate **CycloneDX 1.7 SBOMs** and attest to them.
CycloneDX provides a .NET library and tools for producing and consuming SBOMs. ([GitHub][7])
CycloneDX 1.7 adds cryptography registry, dataprovenance and IP transparency. ([CycloneDX][2])
**Tasks**
1. Add CycloneDX .NET library
* NuGet: `CycloneDX.Core` (and optional `CycloneDX.Utils`). ([NuGet][8])
2. SBOM generation process
* Input: artifact digest + build metadata (e.g., manifest, lock file).
* Generate a **CycloneDX 1.7 SBOM**:
* Fill `metadata.component`, `bomRef`, and dependency graph.
* Include crypto material using the **Cryptography Registry** (algorithms, key sizes, modes) when relevant. ([CycloneDX][9])
* Include data provenance (tool name/version, timestamp).
3. Storage
* Store SBOM documents (JSON) in object storage: `sboms/{artifactDigest}/cyclonedx-1.7.json`.
* Index them in the Attestation DB (see 5).
4. `SbomPredicateV1`
```csharp
public record SbomPredicateV1(
string Format, // "CycloneDX"
string Version, // "1.7"
Uri Location, // URL to the SBOM blob
string? HashAlgorithm,
string? HashDigest // hash of the SBOM document itself
);
```
5. After SBOM generation:
* Create statement with `PredicateTypes.SbomV1`.
* Sign via `IArtifactSigner`.
* Store DSSE envelope + publish `sbom.attestation.created`.
**Definition of done**
* For any scanned artifact, you can fetch:
* A CycloneDX 1.7 SBOM, and
* A DSSEsigned intoto SBOM attestation pointing to it.
---
### 3.3 Vexer (CycloneDX VEX / CSAF)
**Goal**
Turn “raw vulnerability findings” into **VEX documents** that say whether each vulnerability is exploitable, using CycloneDX VEX representation. ([CycloneDX][10])
**Tasks**
1. Model VEX status mapping
* Example statuses: `affected`, `not_affected`, `fixed`, `under_investigation`.
* Derive rules from:
* Reachability analysis, config, feature usage.
* Business logic (e.g., vulnerability only affects optional module not shipped).
2. Generate VEX docs
* Use the same CycloneDX .NET library to emit **CycloneDX VEX** documents.
* Store them: `vex/{artifactDigest}/cyclonedx-vex.json`.
3. `VexPredicateV1`
```csharp
public record VexPredicateV1(
string Format, // "CycloneDX-VEX"
string Version,
Uri Location,
string? HashAlgorithm,
string? HashDigest,
int TotalVulnerabilities,
int ExploitableVulnerabilities
);
```
4. After VEX generation:
* Build statement with `PredicateTypes.VexV1`.
* Sign, store, publish `vex.attestation.created`.
**Definition of done**
* For an artifact with scan results, there is a VEX doc and attestation that:
* Marks each vulnerability with exploitability status.
* Can be consumed by `Scorer` to prioritize risk.
---
### 3.4 Scorer
**Goal**
Compute a **trust/risk score** based on SBOM + VEX + other signals, and attest to it.
**Tasks**
1. Scoring model v1
* Inputs:
* Count of exploitable vulns by severity.
* Presence/absence of required attestations (scan, sbom, vex).
* Age of last scan.
* Output:
* `RiskScore` (0100 or letter grade).
* `RiskTier` (“low”, “medium”, “high”).
* Reasons (top 3 contributors).
2. `ScorePredicateV1`
```csharp
public record ScorePredicateV1(
double Score,
string Tier,
DateTimeOffset CalculatedAt,
IReadOnlyList<string> Reasons
);
```
3. When triggered (new VEX or SBOM):
* Recompute score for the artifact.
* Create attestation, sign, store, publish `score.attestation.created`.
**Definition of done**
* A consumer can call “/artifacts/{digest}/score” and:
* Verify the DSSE envelope,
* Read a deterministic `ScorePredicateV1`.
---
### 3.5 Attestor (Final Verdict + Rekor integration)
**Goal**
Emit the **final verdict attestation** and push evidences to Rekor / Sigstore bundle.
**Tasks**
1. `VerdictPredicateV1`
```csharp
public record VerdictPredicateV1(
string Decision, // "allow" | "deny" | "quarantine"
string PolicyVersion,
DateTimeOffset DecidedAt,
IReadOnlyList<string> Reasons,
string? RequestedBy,
string? Environment // "prod", "staging", etc.
);
```
2. Policy evaluation:
* Input: all attestations for artifact (scan, sbom, vex, score).
* Apply policy (e.g., “no critical exploitable vulns”, “score ≥ 70”).
* Produce `allow` / `deny`.
3. Rekor integration (v2ready)
* Rekor provides an HTTP API and CLI for recording signed metadata. ([Sigstore][11])
* Rekor v2 uses a modern tilebacked log for better cost/ops (you dont need details, just that the API remains similar). ([Sigstore Blog][3])
**Implementation options:**
* **Option A: CLI wrapper**
* Use `rekor-cli` via a sidecar container.
* Call `rekor-cli upload` with the DSSE payload or Sigstore bundle.
* **Option B: Native HTTP client**
* Generate client from Rekor OpenAPI in .NET.
* Implement:
```csharp
public interface IRekorClient
{
Task<RekorEntryRef> UploadDsseAsync(DsseEnvelope envelope, CancellationToken ct);
}
public record RekorEntryRef(
string Uuid,
long LogIndex,
byte[] SignedEntryTimestamp);
```
4. Sigstore bundle support
* A **Sigstore bundle** packages:
* Verification material (cert, Rekor SET, timestamps),
* Signature content (DSSE envelope). ([Sigstore][12])
* You can:
* Store bundles alongside DSSE envelopes: `bundles/{artifactDigest}/{stage}.json`.
* Expose them in an API for offline verification.
5. After producing final verdict:
* Sign verdict statement.
* Upload verdict attestation (and optionally previous key attestations) to Rekor.
* Store Rekor entry ref (`uuid`, `index`, `SET`) in DB.
* Publish `verdict.attestation.created`.
**Definition of done**
* For a given artifact, you can:
* Retrieve a verdict DSSE envelope.
* Verify its signature and Rekor inclusion.
* Optionally retrieve a Sigstore bundle for fully offline verification.
---
## 4. Attestation Store & Data Model
Create an **“Attestation Service”** that all others depend on for reading/writing.
### 4.1 Database schema (simplified)
Relational schema example:
* `artifacts`
* `id` (PK)
* `algorithm`
* `digest`
* `name`
* `version`
* `attestations`
* `id` (PK)
* `artifact_id` (FK)
* `stage` (`scan`, `sbom`, `vex`, `score`, `verdict`)
* `predicate_type`
* `dsse_envelope_json`
* `created_at`
* `signer_key_id`
* `rekor_entries`
* `id` (PK)
* `attestation_id` (FK)
* `uuid`
* `log_index`
* `signed_entry_timestamp` (bytea)
* `sboms`
* `id`
* `artifact_id`
* `format` (CycloneDX)
* `version` (1.7)
* `location`
* `hash_algorithm`
* `hash_digest`
* `vex_documents`
* `id`
* `artifact_id`
* `format`
* `version`
* `location`
* `hash_algorithm`
* `hash_digest`
### 4.2 Attestation Service API
Provide a REST/gRPC API:
* `GET /artifacts/{algo}:{digest}/attestations`
* `GET /artestations/{id}`
* `GET /artifacts/{algo}:{digest}/sbom`
* `GET /artifacts/{algo}:{digest}/vex`
* `GET /artifacts/{algo}:{digest}/score`
* `GET /artifacts/{algo}:{digest}/bundle` (optional, Sigstore bundle)
**Definition of done**
* All other services call Attestation Service instead of touching the DB directly.
* You can fetch the full “attestation chain” for a given artifact from one place.
---
## 5. Observability & QA
### 5.1 Metrics
For each service:
* `attestations_emitted_total{stage}`
* `attestation_sign_errors_total{stage}`
* `rekor_upload_errors_total`
* `attestation_verification_failures_total`
### 5.2 Tests
1. **Contract tests**
* JSON produced for `InTotoStatement` and `DsseEnvelope` is validated by:
* intoto reference tooling.
* DSSE reference implementations. ([GitHub][1])
2. **Endtoend flow**
* Seed a mini pipeline with a test artifact:
* Build → Scan → SBOM → VEX → Score → Verdict.
* Use an external verifier (e.g., cosign, intoto attestation verifier) to:
* Verify DSSE signatures.
* Verify Rekor entries and/or Sigstore bundles. ([Sigstore][6])
3. **Failure scenarios**
* Corrupt payload (verification must fail).
* Missing VEX (policy should deny or fall back to stricter rules).
* Rekor offline (system should continue but mark entries as “not logged”).
---
## 6. Phased Rollout Plan (HighLevel)
You can translate this into epics:
1. **Epic 1 Core Attestation Platform**
* Implement `Attestations.Core` & `Attestations.Signing`.
* Implement Attestation Service + DB schema.
* Build small CLI / test harness.
2. **Epic 2 Scanner Integration**
* Normalize findings.
* Emit scan attestations only (no SBOM/VEX yet).
3. **Epic 3 CycloneDX SBOMs**
* Integrate CycloneDX .NET library.
* Generate 1.7 SBOMs for each artifact.
* Emit SBOM attestations.
4. **Epic 4 VEXer**
* Implement VEX derivation logic + CycloneDX VEX docs.
* Emit VEX attestations.
5. **Epic 5 Scorer & Policy**
* Implement scoring model v1.
* Implement policy engine.
* Emit Score + Verdict attestations.
6. **Epic 6 Rekor & Bundles**
* Stand up Rekor (or integrate with public instance).
* Implement Rekor client and Sigstore bundle support.
* Wire Attestor to log final (and optionally intermediate) attestations.
7. **Epic 7 UX & Docs**
* Build UI (or CLI) to visualize:
* Artifact → SBOM → VEX → Score → Verdict.
* Document how other teams integrate (what events to listen to, which APIs to call).
---
If youd like, I can next:
* Turn this into **Jirastyle epics & stories** with acceptance criteria; or
* Draft the actual **C# interfaces** and a project structure (`src/Attestations.Core`, `src/Attestations.Signing`, services, etc.).
[1]: https://github.com/secure-systems-lab/dsse?utm_source=chatgpt.com "secure-systems-lab/dsse - Dead Simple Signing Envelope"
[2]: https://cyclonedx.org/news/cyclonedx-v1.7-released/?utm_source=chatgpt.com "CycloneDX v1.7 Delivers Advanced Cryptography, ..."
[3]: https://blog.sigstore.dev/rekor-v2-ga/?utm_source=chatgpt.com "Rekor v2 GA - Cheaper to run, simpler to maintain"
[4]: https://slsa.dev/blog/2023/05/in-toto-and-slsa?utm_source=chatgpt.com "in-toto and SLSA"
[5]: https://cloud.google.com/dotnet/docs/reference/Grafeas.V1/latest/Grafeas.V1.Envelope?utm_source=chatgpt.com "Grafeas v1 API - Class Envelope (3.10.0) | .NET client library"
[6]: https://docs.sigstore.dev/cosign/verifying/attestation/?utm_source=chatgpt.com "In-Toto Attestations"
[7]: https://github.com/CycloneDX/cyclonedx-dotnet-library?utm_source=chatgpt.com "NET library to consume and produce CycloneDX Software ..."
[8]: https://www.nuget.org/packages/CycloneDX.Core/?utm_source=chatgpt.com "CycloneDX.Core 10.0.1"
[9]: https://cyclonedx.org/registry/cryptography/?utm_source=chatgpt.com "Cryptography Registry"
[10]: https://cyclonedx.org/capabilities/vex/?utm_source=chatgpt.com "Vulnerability Exploitability eXchange (VEX)"
[11]: https://docs.sigstore.dev/logging/overview/?utm_source=chatgpt.com "Rekor"
[12]: https://docs.sigstore.dev/about/bundle/?utm_source=chatgpt.com "Sigstore Bundle Format"

View File

@@ -0,0 +1,590 @@
Im sharing this because it highlights important recent developments with Rekor — and how its new v2 rollout and behavior with DSSE change what you need to watch out for when building attestations (for example in your StellaOps architecture).
![Image](https://docs.sigstore.dev/sigstore_rekor-horizontal-color.svg)
![Image](https://miro.medium.com/v2/resize%3Afit%3A1200/1%2Abdz7tUqYTQecioDQarHNcw.png)
![Image](https://rewanthtammana.com/sigstore-the-easy-way/images/cosign-attest-sbom-ui.png)
### 🚨 What changed with Rekor v2
* Rekor v2 is now GA: it moves to a tilebacked transparency log backend (via the module rekortiles), which simplifies maintenance and lowers infrastructure cost. ([blog.sigstore.dev][1])
* The global publiclydistributed instance now supports only two entry types: `hashedrekord` (for artifacts) and `dsse` (for attestations). Many previously supported entry types — e.g. `intoto`, `rekord`, `helm`, `rfc3161`, etc. — have been removed. ([blog.sigstore.dev][1])
* The log is now sharded: instead of a single growing Merkle tree, multiple “shards” (trees) are used. This supports better scaling, simpler rotation/maintenance, and easier querying by tree shard + identifier. ([Sigstore][2])
### ⚠️ Why this matters for attestations, and common pitfalls
* Historically, when using DSSE or intoto style attestations submitted to Rekor (or via Cosign), the **entire attestation payload** had to be uploaded to Rekor. That becomes problematic when payloads are large. Theres a reported case where a 130MB attestation was rejected due to size. ([GitHub][3])
* The public instance of Rekor historically had a relatively small attestation size limit (on the order of 100KB) for uploads. ([GitHub][4])
* Because Rekor v2 no longer supports many entry types and simplifies the log types, you no longer have fallback for some of the older attestation/storage formats if they dont fit DSSE/hashedrekord constraints. ([blog.sigstore.dev][1])
### ✅ What you must design for — and pragmatic workarounds
Given your StellaOps architecture goals (deterministic builds, reproducible scans, large SBOMs/metadata, private/offairgap compliance), heres what you should consider:
* **Plan for payload-size constraints**: dont assume arbitrary large attestations will be accepted. Keep attestation payloads small — ideally put large blobs (e.g. full SBOMs, large metadata) **outside** DSSE and store them elsewhere (artifact storage, internal logs, blob store) with the attestation only embedding a hash or reference.
* **Use “private logs” / selfhosted Rekor** if you anticipate large payloads — public instance limits make heavy payload uploads impractical. Running your own instance gives you control over size limits and resource allocation. ([GitHub][4])
* **Chunking / sharding**: For large metadata blobs, consider splitting (“sharding”) or chunking the data into smaller pieces, each with its own DSSE/hashedrekord entry, then reference or reassemble externally. This avoids hitting size limits while maintaining inclusion proofs.
* **Build idempotent resubmit logic**: Because DSSE/hashedrekord entries are the only supported types, and large payloads may fail, your pipelines (e.g. StellaOps) should handle retries, partial submits, and ensure idempotence — so resubmits dont create inconsistent or duplicate entries.
* **Persist full attestations outside Rekor**: Since Rekor v2 dropped many types and doesnt store full arbitrary blobs necessarily, ensure that the “source of truth” for large metadata remains under your control (e.g. in your internal storage), with Rekor only storing minimal hashed attestations.
### 🎯 What this means for StellaOps
For your “ProofofIntegrity Graph” + “offline bundle + replayable scans” vision — this means you should treat Rekor (especially public) as a *lightweight ledger for proofs*, not as a full-blown metadata store. In practice:
* Use Rekor for auditing, signature/inclusion proofs, and “committohash” attestations.
* Store full SBOMs, VEX data, scan metadata, large build/context blobs externally.
* Build your logsharding, replay, and artifactreference logic to survive Rekors constraints.
* For airgapped workflows, consider a private Rekor or alternative internal log that mirrors the public transparencylog guarantees but under your control.
If you like — I can pull up **realworld examples** of organizations that already adapted to Rekor v2s DSSE constraints (with chunking, private logs, hybrid storage) — could help shape StellaOps resilience strategy.
[1]: https://blog.sigstore.dev/rekor-v2-ga/?utm_source=chatgpt.com "Rekor v2 GA - Cheaper to run, simpler to maintain"
[2]: https://docs.sigstore.dev/logging/sharding/?utm_source=chatgpt.com "Sharding"
[3]: https://github.com/sigstore/cosign/issues/3599?utm_source=chatgpt.com "Attestations require uploading entire payload to rekor #3599"
[4]: https://github.com/sigstore/rekor?utm_source=chatgpt.com "sigstore/rekor: Software Supply Chain Transparency Log"
Heres a concrete, developerfriendly implementation plan you can hand to the team. Ill assume the context is “StellaOps + Sigstore/Rekor v2 + DSSE + airgapped support”.
---
## 0. Shared context & constraints (what devs should keep in mind)
**Key facts (summarized):**
* Rekor v2 keeps only **two** entry types: `hashedrekord` (artifact signatures) and `dsse` (attestations). Older types (`intoto`, `rekord`, etc.) are gone. ([Sigstore Blog][1])
* The **public** Rekor instance enforces a ~**100KB attestation size limit** per upload; bigger payloads must use your **own Rekor instance** instead. ([GitHub][2])
* For DSSE entries, Rekor **does not store the full payload**; it stores hashes and verification material. Users are expected to persist the attestations alongside artifacts in their own storage. ([Go Packages][3])
* People have already hit problems where ~130MB attestations were rejected by Rekor, showing that “just upload the whole SBOM/provenance” is not sustainable. ([GitHub][4])
* Sigstores **bundle** format is the canonical way to ship DSSE + tlog metadata around as a single JSON object (very useful for offline/airgapped replay). ([Sigstore][5])
**Guiding principles for the implementation:**
1. **Rekor is a ledger, not a blob store.** We log *proofs* (hashes, inclusion proofs), not big documents.
2. **Attestation payloads live in our storage** (object store / DB).
3. **All Rekor interaction goes through one abstraction** so we can easily switch public/private/none.
4. **Everything is idempotent and replayable** (important for retries and airgapped exports).
---
## 1. Highlevel architecture
### 1.1 Components
1. **Attestation Builder library (in CI/build tools)**
* Used by build pipelines / scanners / SBOM generators.
* Responsibilities:
* Collect artifact metadata (digest, build info, SBOM, scan results).
* Call Attestation API (below) with **semantic info** and raw payload(s).
2. **Attestation Service (core backend microservice)**
* Single entrypoint for creating and managing attestations.
* Responsibilities:
* Normalize incoming metadata.
* Store large payload(s) in object store.
* Construct **small DSSE envelope** (payload = manifest / summary, not giant blob).
* Persist attestation records & payload manifests in DB.
* Enqueue logsubmission jobs for:
* Public Rekor v2
* Private Rekor v2 (optional)
* Internal event log (DB/Kafka)
* Produce **Sigstore bundles** for offline use.
3. **Log Writer / Rekor Client Worker(s)**
* Background workers consuming submission jobs.
* Responsibilities:
* Submit `dsse` (and optionally `hashedrekord`) entries to configured Rekor instances.
* Handle retries with backoff.
* Guarantee idempotency (no duplicate entries, no inconsistent state).
* Update DB with Rekor log index/uuid and status.
4. **Offline Bundle Exporter (CLI or API)**
* Runs in airgapped cluster.
* Responsibilities:
* Periodically export “new” attestations + bundles since last export.
* Materialize data as tar/zip with:
* Sigstore bundles (JSON)
* Chunk manifests
* Large payload chunks (optional, depending on policy).
5. **Offline Replay Service (connected environment)**
* Runs where internet access and public Rekor are available.
* Responsibilities:
* Read offline bundles from incoming location.
* Replay to:
* Public Rekor
* Cloud storage
* Internal observability
* Write updated status back (e.g., via a status file or callback).
6. **Config & Policy Layer**
* Central (e.g. YAML, env, config DB).
* Controls:
* Which logs to use: `public_rekor`, `private_rekor`, `internal_only`.
* Size thresholds (DSSE payload limit, chunk size).
* Retry/backoff policy.
* Airgapped mode toggles.
---
## 2. Data model (DB + storage)
Use whatever DB you have (Postgres is fine). Heres a suggested schema, adapt as needed.
### 2.1 Core tables
**`attestations`**
| Column | Type | Description |
| ------------------------ | ----------- | ----------------------------------------- |
| `id` | UUID (PK) | Internal identifier |
| `subject_digest` | text | e.g., `sha256:<hex>` of build artifact |
| `subject_uri` | text | Optional URI (image ref, file path, etc.) |
| `predicate_type` | text | e.g. `https://slsa.dev/provenance/v1` |
| `payload_schema_version` | text | Version of our manifest schema |
| `dsse_envelope_digest` | text | `sha256` of DSSE envelope |
| `bundle_location` | text | URL/path to Sigstore bundle (if cached) |
| `created_at` | timestamptz | Creation time |
| `created_by` | text | Origin (pipeline id, service name) |
| `metadata` | jsonb | Extra labels / tags |
**`payload_manifests`**
| Column | Type | Description |
| --------------------- | ----------- | ------------------------------------------------- |
| `attestation_id` (FK) | UUID | Link to `attestations.id` |
| `total_size_bytes` | bigint | Size of the *full* logical payload |
| `chunk_count` | int | Number of chunks |
| `root_digest` | text | Digest of full payload or Merkle root over chunks |
| `manifest_json` | jsonb | The JSON we sign in the DSSE payload |
| `created_at` | timestamptz | |
**`payload_chunks`**
| Column | Type | Description |
| --------------------- | ----------------------------- | ---------------------- |
| `attestation_id` (FK) | UUID | |
| `chunk_index` | int | 0based index |
| `chunk_digest` | text | sha256 of this chunk |
| `size_bytes` | bigint | Size of chunk |
| `storage_uri` | text | `s3://…` or equivalent |
| PRIMARY KEY | (attestation_id, chunk_index) | Ensures uniqueness |
**`log_submissions`**
| Column | Type | Description |
| --------------------- | ----------- | --------------------------------------------------------- |
| `id` | UUID (PK) | |
| `attestation_id` (FK) | UUID | |
| `target` | text | `public_rekor`, `private_rekor`, `internal` |
| `submission_key` | text | Idempotency key (see below) |
| `state` | text | `pending`, `in_progress`, `succeeded`, `failed_permanent` |
| `attempt_count` | int | For retries |
| `last_error` | text | Last error message |
| `rekor_log_index` | bigint | If applicable |
| `rekor_log_id` | text | Log ID (tree ID / key ID) |
| `created_at` | timestamptz | |
| `updated_at` | timestamptz | |
Add a **unique index** on `(target, submission_key)` to guarantee idempotency.
---
## 3. DSSE payload design (how to avoid size limits)
### 3.1 Manifestbased DSSE instead of giant payloads
Instead of DSSEsigning the **entire SBOM/provenance blob** (which hits Rekors 100KB limit), we sign a **manifest** describing where the payload lives and how to verify it.
**Example manifest JSON** (payload of DSSE, small):
```json
{
"version": "stellaops.manifest.v1",
"subject": {
"uri": "registry.example.com/app@sha256:abcd...",
"digest": "sha256:abcd..."
},
"payload": {
"type": "sbom.spdx+json",
"rootDigest": "sha256:deadbeef...",
"totalSize": 73400320,
"chunkCount": 12
},
"chunks": [
{
"index": 0,
"digest": "sha256:1111...",
"size": 6291456
},
{
"index": 1,
"digest": "sha256:2222...",
"size": 6291456
}
// ...
],
"storagePolicy": {
"backend": "s3",
"bucket": "stellaops-attestations",
"pathPrefix": "sboms/app/abcd..."
}
}
```
* This JSON is small enough to **fit under 100KB** even with lots of chunks, so the DSSE envelope stays small.
* Full SBOM/scan results live in your object store; Rekor logs the DSSE envelope hash.
### 3.2 Chunking logic (Attestation Service)
Config values (can be env vars):
* `CHUNK_SIZE_BYTES` = e.g. 510 MiB
* `MAX_DSSE_PAYLOAD_BYTES` = e.g. 70 KiB (keeping margin under Rekor 100KB limit)
* `MAX_CHUNK_COUNT` = safety guard
Algorithm:
1. Receive raw payload bytes (SBOM / provenance / scan results).
2. Compute full `root_digest = sha256(payload_bytes)` (or Merkle root if you want more advanced verification).
3. If `len(payload_bytes) <= SMALL_PAYLOAD_THRESHOLD` (e.g. 64 KB):
* Skip chunking.
* Store payload as single object.
* Manifest can optionally omit `chunks` and just record one object.
4. If larger:
* Split into fixedsize chunks (except last).
* For each chunk:
* Compute `chunk_digest`.
* Upload chunk to object store path derived from `root_digest` + `chunk_index`.
* Insert `payload_chunks` rows.
5. Build manifest JSON with:
* `version`
* `subject`
* `payload` block
* `chunks[]` (no URIs if you dont want to leak details; the URIs can be derived by clients).
6. Check serialized manifest size ≤ `MAX_DSSE_PAYLOAD_BYTES`. If not:
* Option A: increase chunk size so you have fewer chunks.
* Option B: move chunk list to a secondary “chunk index” document and sign only its root digest.
7. DSSEsign manifest JSON.
8. Persist DSSE envelope digest + manifest in DB.
---
## 4. Rekor integration & idempotency
### 4.1 Rekor client abstraction
Implement an interface like:
```ts
interface TransparencyLogClient {
submitDsseEnvelope(params: {
dsseEnvelope: Buffer; // JSON bytes
subjectDigest: string;
predicateType: string;
}): Promise<{
logIndex: number;
logId: string;
entryUuid: string;
}>;
}
```
Provide implementations:
* `PublicRekorClient` (points at `https://rekor.sigstore.dev` or v2 equivalent).
* `PrivateRekorClient` (your own Rekor v2 cluster).
* `NullClient` (for internalonly mode).
Use official API semantics from Rekor OpenAPI / SDKs where possible. ([Sigstore][6])
### 4.2 Submission jobs & idempotency
**Submission key design:**
```text
submission_key = sha256(
"dsse" + "|" +
rekor_base_url + "|" +
dsse_envelope_digest
)
```
Workflow in the worker:
1. Worker fetches `log_submissions` with `state = 'pending'` or due for retry.
2. Set `state = 'in_progress'` (optimistic update).
3. Call `client.submitDsseEnvelope`.
4. If success:
* Update `state = 'succeeded'`, set `rekor_log_index`, `rekor_log_id`.
5. If Rekor indicates “already exists” (or returns same logIndex for same envelope):
* Treat as success, update `state = 'succeeded'`.
6. On network/5xx errors:
* Increment `attempt_count`.
* If `attempt_count < MAX_RETRIES`: schedule retry with backoff.
* Else: `state = 'failed_permanent'`, keep `last_error`.
DB constraint: `UNIQUE(target, submission_key)` ensures we dont create conflicting jobs.
---
## 5. Attestation Service API design
### 5.1 Create attestation (build/scan pipeline → Attestation Service)
**`POST /v1/attestations`**
**Request body (example):**
```json
{
"subject": {
"uri": "registry.example.com/app@sha256:abcd...",
"digest": "sha256:abcd..."
},
"payloadType": "sbom.spdx+json",
"payload": {
"encoding": "base64",
"data": "<base64-encoded-sbom-or-scan>"
},
"predicateType": "https://slsa.dev/provenance/v1",
"logTargets": ["internal", "private_rekor", "public_rekor"],
"airgappedMode": false,
"labels": {
"team": "payments",
"env": "prod"
}
}
```
**Server behavior:**
1. Validate subject & payload.
2. Chunk payload as per rules (section 3).
3. Store payload chunks.
4. Build manifest JSON & DSSE envelope.
5. Insert `attestations`, `payload_manifests`, `payload_chunks`.
6. For each `logTargets`:
* Insert `log_submissions` row with `state = 'pending'`.
7. Optionally construct Sigstore bundle representing:
* DSSE envelope
* Transparency log entry (when available) — for async, you can fill this later.
8. Return `202 Accepted` with resource URL:
```json
{
"attestationId": "1f4b3d...",
"status": "pending_logs",
"subjectDigest": "sha256:abcd...",
"logTargets": ["internal", "private_rekor", "public_rekor"],
"links": {
"self": "/v1/attestations/1f4b3d...",
"bundle": "/v1/attestations/1f4b3d.../bundle"
}
}
```
### 5.2 Get attestation status
**`GET /v1/attestations/{id}`**
Returns:
```json
{
"attestationId": "1f4b3d...",
"subjectDigest": "sha256:abcd...",
"predicateType": "https://slsa.dev/provenance/v1",
"logs": {
"internal": {
"state": "succeeded"
},
"private_rekor": {
"state": "succeeded",
"logIndex": 1234,
"logId": "..."
},
"public_rekor": {
"state": "pending",
"lastError": null
}
},
"createdAt": "2025-11-27T12:34:56Z"
}
```
### 5.3 Get bundle
**`GET /v1/attestations/{id}/bundle`**
* Returns a **Sigstore bundle JSON** that:
* Contains either:
* Only the DSSE + identity + certificate chain (if logs not yet written).
* Or DSSE + log entries (`hashedrekord` / `dsse` entries) for whichever logs are ready. ([Sigstore][5])
* This is what airgapped exports and verifiers consume.
---
## 6. Airgapped workflows
### 6.1 In the airgapped environment
* Attestation Service runs in “airgapped mode”:
* `logTargets` typically = `["internal", "private_rekor"]`.
* No direct public Rekor.
* **Offline Exporter CLI**:
```bash
stellaops-offline-export \
--since-id <last_exported_attestation_id> \
--output offline-bundle-<timestamp>.tar.gz
```
* Exporter logic:
1. Query DB for new `attestations` > `since-id`.
2. For each attestation:
* Fetch DSSE envelope.
* Fetch current log statuses (private rekor, internal).
* Build or reuse Sigstore bundle JSON.
* Optionally include payload chunks and/or original payload.
3. Write them into a tarball with structure like:
```
/attestations/<id>/bundle.json
/attestations/<id>/chunks/chunk-0000.bin
...
/meta/export-metadata.json
```
### 6.2 In the connected environment
* **Replay Service**:
```bash
stellaops-offline-replay \
--input offline-bundle-<timestamp>.tar.gz \
--public-rekor-url https://rekor.sigstore.dev
```
* Replay logic:
1. Read each `/attestations/<id>/bundle.json`.
2. If `public_rekor` entry not present:
* Extract DSSE envelope from bundle.
* Call Attestation Service “import & log” endpoint or directly call PublicRekorClient.
* Build new updated bundle (with public tlog entry).
3. Emit an updated `result.json` for each attestation (so you can sync status back to original environment if needed).
---
## 7. Observability & ops
### 7.1 Metrics
Have devs expose at least:
* `rekor_submit_requests_total{target, outcome}`
* `rekor_submit_latency_seconds{target}` (histogram)
* `log_submissions_in_queue{target}`
* `attestations_total{predicateType}`
* `attestation_payload_bytes{bucket}` (distribution of payload sizes)
### 7.2 Logging
* Log at **info**:
* Attestation created (subject digest, predicateType, manifest version).
* Log submission succeeded (target, logIndex, logId).
* Log at **warn/error**:
* Any permanent failure.
* Any time DSSE payload nearly exceeds size threshold (to catch misconfig).
### 7.3 Feature flags
* `FEATURE_REKOR_PUBLIC_ENABLED`
* `FEATURE_REKOR_PRIVATE_ENABLED`
* `FEATURE_OFFLINE_EXPORT_ENABLED`
* `FEATURE_CHUNKING_ENABLED` (to allow rolling rollout)
---
## 8. Concrete work breakdown for developers
You can basically drop this as a backlog outline:
1. **Domain model & storage**
* [ ] Implement DB migrations for `attestations`, `payload_manifests`, `payload_chunks`, `log_submissions`.
* [ ] Implement object storage abstraction and contentaddressable layout for chunks.
2. **Attestation Service skeleton**
* [ ] Implement `POST /v1/attestations` with basic validation.
* [ ] Implement manifest building and DSSE envelope creation (no Rekor yet).
* [ ] Persist records in DB.
3. **Chunking & manifest logic**
* [ ] Implement chunker with thresholds & tests (small vs large).
* [ ] Implement manifest JSON builder.
* [ ] Ensure DSSE payload size is under configurable limit.
4. **Rekor client & log submissions**
* [ ] Implement `TransparencyLogClient` interface + Public/Private implementations.
* [ ] Implement `log_submissions` worker (queue + backoff + idempotency).
* [ ] Wire worker into service config and deployment.
5. **Sigstore bundle support**
* [ ] Implement bundle builder given DSSE envelope + log metadata.
* [ ] Add `GET /v1/attestations/{id}/bundle`.
6. **Offline export & replay**
* [ ] Implement Exporter CLI (queries DB, packages bundles and chunks).
* [ ] Implement Replay CLI/service (reads tarball, logs to public Rekor).
* [ ] Document operator workflow for moving tarballs between environments.
7. **Observability & docs**
* [ ] Add metrics, logs, and dashboards.
* [ ] Write verification docs: “How to fetch manifest, verify DSSE, reconstruct payload, and check Rekor.”
---
If youd like, next step I can do is: take this and turn it into a more strict format your devs might already use (e.g. Jira epics + stories, or a design doc template with headers like “Motivation, Alternatives, Risks, Rollout Plan”).
[1]: https://blog.sigstore.dev/rekor-v2-ga/?utm_source=chatgpt.com "Rekor v2 GA - Cheaper to run, simpler to maintain"
[2]: https://github.com/sigstore/rekor?utm_source=chatgpt.com "sigstore/rekor: Software Supply Chain Transparency Log"
[3]: https://pkg.go.dev/github.com/sigstore/rekor/pkg/types/dsse?utm_source=chatgpt.com "dsse package - github.com/sigstore/rekor/pkg/types/dsse"
[4]: https://github.com/sigstore/cosign/issues/3599?utm_source=chatgpt.com "Attestations require uploading entire payload to rekor #3599"
[5]: https://docs.sigstore.dev/about/bundle/?utm_source=chatgpt.com "Sigstore Bundle Format"
[6]: https://docs.sigstore.dev/logging/overview/?utm_source=chatgpt.com "Rekor"

View File

@@ -0,0 +1,886 @@
Heres a concrete, lowlift way to boost StellaOpss visibility and prove your “deterministic, replayable” moat: publish a **sanitized subset of reachability graphs** as a public benchmark that others can run and score identically.
### What this is (plain English)
* You release a small, carefully scrubbed set of **packages + SBOMs + VEX + callgraphs** (source & binaries) with **groundtruth reachability labels** for a curated list of CVEs.
* You also ship a **deterministic scoring harness** (container + manifest) so anyone can reproduce the exact scores, byteforbyte.
### Why it helps
* **Proof of determinism:** identical inputs → identical graphs → identical scores.
* **Research magnet:** gives labs and tool vendors a neutral yardstick; you become “the” benchmark steward.
* **Biz impact:** easy demo for buyers; lets you publish leaderboards and whitepapers.
### Scope (MVP dataset)
* **Languages:** PHP, JS, Python, plus **binary** (ELF/PE/MachO) mini-cases.
* **Units:** 2030 packages total; 36 CVEs per language; 46 binary cases (static & dynamicallylinked).
* **Artifacts per unit:**
* Package tarball(s) or container image digest
* SBOM (CycloneDX 1.6 + SPDX 3.0.1)
* VEX (knownexploited, notaffected, underinvestigation)
* **Call graph** (normalized JSON)
* **Ground truth**: list of vulnerable entrypoints/edges considered *reachable*
* **Determinism manifest**: feed URLs + rule hashes + container digests + tool versions
### Data model (keep it simple)
* `dataset.json`: index of cases with contentaddressed URIs (sha256)
* `sbom/`, `vex/`, `graphs/`, `truth/` folders mirroring the index
* `manifest.lock.json`: DSSEsigned record of:
* feeder rules, lattice policies, normalizers (name + version + hash)
* container image digests for each step (scanner/cartographer/normalizer)
* timestamp + signer (StellaOps Authority)
### Scoring harness (deterministic)
* One Docker image: `stellaops/benchmark-harness:<tag>`
* Inputs: dataset root + `manifest.lock.json`
* Outputs:
* `scores.json` (precision/recall/F1, percase and macro)
* `replay-proof.txt` (hashes of every artifact used)
* **No network** mode (offlinefirst). Fails closed if any hash mismatches.
### Metrics (clear + auditable)
* Per case: TP/FP/FN for **reachable** functions (or edges), plus optional **sinkreach** verification.
* Aggregates: micro/macro F1; “Determinism Index” (stddev of repeated runs must be 0).
* **Repro test:** the harness reruns N=3 and asserts identical outputs (hash compare).
### Sanitization & legal
* Strip any proprietary code/data; prefer OSS with permissive licenses.
* Replace real package registries with **local mirrors** and pin digests.
* Publish under **CCBY4.0** (data) + **Apache2.0** (harness). Add a simple **contributor license agreement** for external case submissions.
### Baselines to include (neutral + useful)
* “Naïve reachable” (all functions in package)
* “Importsonly” (entrypoints that match import graph)
* “Calldepth2” (bounded traversal)
* **Your** graph engine run with **frozen rules** from the manifest (as a reference, not a claim of SOTA)
### Repository layout (public)
```
stellaops-reachability-benchmark/
dataset/
dataset.json
sbom/...
vex/...
graphs/...
truth/...
manifest.lock.json (DSSE-signed)
harness/
Dockerfile
runner.py (CLI)
schema/ (JSON Schemas for graphs, truth, scores)
docs/
HOWTO.md (5-min run)
CONTRIBUTING.md
SANITIZATION.md
LICENSES/
```
### Docs your team can ship in a day
* **HOWTO.md:** `docker run -v $PWD/dataset:/d -v $PWD/out:/o stellaops/benchmark-harness score /d /o`
* **SCHEMA.md:** JSON Schemas for graph and truth (keep fields minimal: `nodes`, `edges`, `purls`, `sinks`, `evidence`).
* **REPRODUCIBILITY.md:** explains DSSE signatures, lockfile, and offline run.
* **LIMITATIONS.md:** clarifies scope (no dynamic runtime traces in v1, etc.).
### Governance (lightweight)
* **Versioned releases:** `v0.1`, `v0.2` with changelogs.
* **Submission gate:** PR template + CI that:
* validates schemas
* checks hashes match lockfile
* rescores and compares to contributors score
* **Leaderboard cadence:** monthly markdown table regenerated by CI.
### Launch plan (2week sprint)
* **Day 12:** pick cases; finalize schemas; write SANITIZATION.md.
* **Day 35:** build harness image; implement deterministic runner; freeze `manifest.lock.json`.
* **Day 68:** produce ground truth; run baselines; generate initial scores.
* **Day 910:** docs + website README; record a 2minute demo GIF.
* **Day 1112:** legal review + licenses; create issue labels (“good first case”).
* **Day 1314:** publish, post on GitHub + LinkedIn; invite Semgrep/Snyk/OSSFuzz folks to submit cases.
### Nicetohave (but easy)
* **JSON Schema** for groundtruth edges so academics can autoingest.
* **Small “unknowns” registry** example to show how you annotate unresolved symbols without breaking determinism.
* **Binary minilab**: stripped vs nonstripped ELF pair to show your patchoracle technique in action (truth labels reflect oracle result).
If you want, I can draft the repo skeleton (folders, placeholder JSON Schemas, a sample `manifest.lock.json`, and a minimal `runner.py` CLI) so you can drop it straight into GitHub.
Got you — lets turn that highlevel idea into something your devs can actually pick up and ship.
Below is a **concrete implementation plan** for the *StellaOps Reachability Benchmark* repo: directory structure, components, tasks, and acceptance criteria. You can drop this straight into a ticketing system as epics → stories.
---
## 0. Tech assumptions (adjust if needed)
To be specific, Ill assume:
* **Repo**: `stellaops-reachability-benchmark`
* **Harness language**: Python 3.11+
* **Packaging**: Docker image for the harness
* **Schemas**: JSON Schema (Draft 202012)
* **CI**: GitHub Actions
If your stack differs, you can still reuse the structure and acceptance criteria.
---
## 1. Repo skeleton & project bootstrap
**Goal:** Create a minimal but fully wired repo.
### Tasks
1. **Create skeleton**
* Structure:
```text
stellaops-reachability-benchmark/
dataset/
dataset.json
sbom/
vex/
graphs/
truth/
packages/
manifest.lock.json # initially stub
harness/
reachbench/
__init__.py
cli.py
dataset_loader.py
schemas/
graph.schema.json
truth.schema.json
dataset.schema.json
scores.schema.json
tests/
docs/
HOWTO.md
SCHEMA.md
REPRODUCIBILITY.md
LIMITATIONS.md
SANITIZATION.md
.github/
workflows/
ci.yml
pyproject.toml
README.md
LICENSE
Dockerfile
```
2. **Bootstrap Python project**
* `pyproject.toml` with:
* `reachbench` package
* deps: `jsonschema`, `click` or `typer`, `pyyaml`, `pytest`
* `harness/tests/` with a dummy test to ensure CI is green.
3. **Dockerfile**
* Minimal, pinned versions:
```Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir .
ENTRYPOINT ["reachbench"]
```
4. **CI basic pipeline (`.github/workflows/ci.yml`)**
* Jobs:
* `lint` (e.g., `ruff` or `flake8` if you want)
* `test` (pytest)
* `build-docker` (just to ensure Dockerfile stays valid)
### Acceptance criteria
* `pip install .` works locally.
* `reachbench --help` prints CLI help (even if commands are stubs).
* CI passes on main branch.
---
## 2. Dataset & schema definitions
**Goal:** Define all JSON formats and enforce them.
### 2.1 Define dataset index format (`dataset/dataset.json`)
**File:** `dataset/dataset.json`
**Example:**
```json
{
"version": "0.1.0",
"cases": [
{
"id": "php-wordpress-5.8-cve-2023-12345",
"language": "php",
"kind": "source", // "source" | "binary" | "container"
"cves": ["CVE-2023-12345"],
"artifacts": {
"package": {
"path": "packages/php/wordpress-5.8.tar.gz",
"sha256": "…"
},
"sbom": {
"path": "sbom/php/wordpress-5.8.cdx.json",
"format": "cyclonedx-1.6",
"sha256": "…"
},
"vex": {
"path": "vex/php/wordpress-5.8.vex.json",
"format": "csaf-2.0",
"sha256": "…"
},
"graph": {
"path": "graphs/php/wordpress-5.8.graph.json",
"schema": "graph.schema.json",
"sha256": "…"
},
"truth": {
"path": "truth/php/wordpress-5.8.truth.json",
"schema": "truth.schema.json",
"sha256": "…"
}
}
}
]
}
```
### 2.2 Define **truth schema** (`harness/reachbench/schemas/truth.schema.json`)
**Model (conceptual):**
```jsonc
{
"case_id": "php-wordpress-5.8-cve-2023-12345",
"vulnerable_components": [
{
"cve": "CVE-2023-12345",
"symbol": "wp_ajax_nopriv_some_vuln",
"symbol_kind": "function", // "function" | "method" | "binary_symbol"
"status": "reachable", // "reachable" | "not_reachable"
"reachable_from": [
{
"entrypoint_id": "web:GET:/foo",
"notes": "HTTP route /foo"
}
],
"evidence": "manual-analysis" // or "unit-test", "patch-oracle"
}
],
"non_vulnerable_components": [
{
"symbol": "wp_safe_function",
"symbol_kind": "function",
"status": "not_reachable",
"evidence": "manual-analysis"
}
]
}
```
**Tasks**
* Implement JSON Schema capturing:
* required fields: `case_id`, `vulnerable_components`
* allowed enums for `symbol_kind`, `status`, `evidence`
* Add unit tests that:
* validate a valid truth file
* fail on various broken ones (missing `case_id`, unknown `status`, etc.)
### 2.3 Define **graph schema** (`harness/reachbench/schemas/graph.schema.json`)
**Model (conceptual):**
```jsonc
{
"case_id": "php-wordpress-5.8-cve-2023-12345",
"language": "php",
"nodes": [
{
"id": "func:wp_ajax_nopriv_some_vuln",
"symbol": "wp_ajax_nopriv_some_vuln",
"kind": "function",
"purl": "pkg:composer/wordpress/wordpress@5.8"
}
],
"edges": [
{
"from": "func:wp_ajax_nopriv_some_vuln",
"to": "func:wpdb_query",
"kind": "call"
}
],
"entrypoints": [
{
"id": "web:GET:/foo",
"symbol": "some_controller",
"kind": "http_route"
}
]
}
```
**Tasks**
* JSON Schema with:
* `nodes[]` (id, symbol, kind, optional purl)
* `edges[]` (`from`, `to`, `kind`)
* `entrypoints[]` (id, symbol, kind)
* Tests: verify a valid graph; invalid ones (missing `id`, unknown `kind`) are rejected.
### 2.4 Dataset index schema (`dataset.schema.json`)
* JSON Schema describing `dataset.json` (version string, cases array).
* Tests: validate the example dataset file.
### Acceptance criteria
* Running a simple script (will be `reachbench validate-dataset`) validates all JSON files in `dataset/` against schemas without errors.
* CI fails if any dataset JSON is invalid.
---
## 3. Lockfile & determinism manifest
**Goal:** Implement `manifest.lock.json` generation and verification.
### 3.1 Lockfile structure
**File:** `dataset/manifest.lock.json`
**Example:**
```jsonc
{
"version": "0.1.0",
"created_at": "2025-01-15T12:00:00Z",
"dataset": {
"root": "dataset/",
"sha256": "…",
"cases": {
"php-wordpress-5.8-cve-2023-12345": {
"sha256": "…"
}
}
},
"tools": {
"graph_normalizer": {
"name": "stellaops-graph-normalizer",
"version": "1.2.3",
"sha256": "…"
}
},
"containers": {
"scanner_image": "ghcr.io/stellaops/scanner@sha256:…",
"normalizer_image": "ghcr.io/stellaops/normalizer@sha256:…"
},
"signatures": [
{
"type": "dsse",
"key_id": "stellaops-benchmark-key-1",
"signature": "base64-encoded-blob"
}
]
}
```
*(Signatures can be optional in v1 but structure should be there.)*
### 3.2 `lockfile.py` module
**File:** `harness/reachbench/lockfile.py`
**Responsibilities**
* Compute deterministic SHA-256 digest of:
* each cases artifacts (path → hash from `dataset.json`)
* entire `dataset/` tree (sorted traversal)
* Generate new `manifest.lock.json`:
* `version` (hard-coded constant)
* `created_at` (UTC ISO8601)
* `dataset` section with case hashes
* Verification:
* `verify_lockfile(dataset_root, lockfile_path)`:
* recompute hashes
* compare to `lockfile.dataset`
* return boolean + list of mismatches
**Tasks**
1. Implement canonical hashing:
* For text JSON files: normalize with:
* sort keys
* no whitespace
* UTF8 encoding
* For binaries (packages): raw bytes.
2. Implement `compute_dataset_hashes(dataset_root)`:
* Returns `{"cases": {...}, "root_sha256": "…"}`.
3. Implement `write_lockfile(...)` and `verify_lockfile(...)`.
4. Tests:
* Two calls with same dataset produce identical lockfile (order of `cases` keys normalized).
* Changing any artifact file changes the root hash and causes verify to fail.
### 3.3 CLI commands
Add to `cli.py`:
* `reachbench compute-lockfile --dataset-root ./dataset --out ./dataset/manifest.lock.json`
* `reachbench verify-lockfile --dataset-root ./dataset --lockfile ./dataset/manifest.lock.json`
### Acceptance criteria
* `reachbench compute-lockfile` generates a stable file (byte-for-byte identical across runs).
* `reachbench verify-lockfile` exits with:
* code 0 if matches
* non-zero if mismatch (plus human-readable diff).
---
## 4. Scoring harness CLI
**Goal:** Deterministically score participant results against ground truth.
### 4.1 Result format (participant output)
**Expectation:**
Participants provide `results/` with one JSON per case:
```text
results/
php-wordpress-5.8-cve-2023-12345.json
js-express-4.17-cve-2022-9999.json
```
**Result file example:**
```jsonc
{
"case_id": "php-wordpress-5.8-cve-2023-12345",
"tool_name": "my-reachability-analyzer",
"tool_version": "1.0.0",
"predictions": [
{
"cve": "CVE-2023-12345",
"symbol": "wp_ajax_nopriv_some_vuln",
"symbol_kind": "function",
"status": "reachable"
},
{
"cve": "CVE-2023-12345",
"symbol": "wp_safe_function",
"symbol_kind": "function",
"status": "not_reachable"
}
]
}
```
### 4.2 Scoring model
* Treat scoring as classification over `(cve, symbol)` pairs.
* For each case:
* Truth positives: all `vulnerable_components` with `status == "reachable"`.
* Truth negatives: everything marked `not_reachable` (optional in v1).
* Predictions: all entries with `status == "reachable"`.
* Compute:
* `TP`: predicted reachable & truth reachable.
* `FP`: predicted reachable but truth says not reachable / unknown.
* `FN`: truth reachable but not predicted reachable.
* Metrics:
* Precision, Recall, F1 per case.
* Macro-averaged metrics across all cases.
### 4.3 Implementation (`scoring.py`)
**File:** `harness/reachbench/scoring.py`
**Functions:**
* `load_truth(case_truth_path) -> TruthModel`
* `load_predictions(predictions_path) -> PredictionModel`
* `compute_case_metrics(truth, preds) -> dict`
* returns:
```python
{
"case_id": str,
"tp": int,
"fp": int,
"fn": int,
"precision": float,
"recall": float,
"f1": float
}
```
* `aggregate_metrics(case_metrics_list) -> dict`
* `macro_precision`, `macro_recall`, `macro_f1`, `num_cases`.
### 4.4 CLI: `score`
**Signature:**
```bash
reachbench score \
--dataset-root ./dataset \
--results-root ./results \
--lockfile ./dataset/manifest.lock.json \
--out ./out/scores.json \
[--cases php-*] \
[--repeat 3]
```
**Behavior:**
1. **Verify lockfile** (fail closed if mismatch).
2. Load `dataset.json`, filter cases if `--cases` is set (glob).
3. For each case:
* Load truth file (and validate schema).
* Locate results file (`<case_id>.json`) under `results-root`:
* If missing, treat as all FN (or mark case as “no submission”).
* Load and validate predictions (include a JSON Schema: `results.schema.json`).
* Compute per-case metrics.
4. Aggregate metrics.
5. Write `scores.json`:
```jsonc
{
"version": "0.1.0",
"dataset_version": "0.1.0",
"generated_at": "2025-01-15T12:34:56Z",
"macro_precision": 0.92,
"macro_recall": 0.88,
"macro_f1": 0.90,
"cases": [
{
"case_id": "php-wordpress-5.8-cve-2023-12345",
"tp": 10,
"fp": 1,
"fn": 2,
"precision": 0.91,
"recall": 0.83,
"f1": 0.87
}
]
}
```
6. **Determinism check**:
* If `--repeat N` given:
* Re-run scoring in-memory N times.
* Compare resulting JSON strings (canonicalized via sorted keys).
* If any differ, exit non-zero with message (“non-deterministic scoring detected”).
### 4.5 Offline-only mode
* In `cli.py`, early check:
```python
if os.getenv("REACHBENCH_OFFLINE_ONLY", "1") == "1":
# Verify no outbound network: by policy, just ensure we never call any net libs.
# (In v1, simply avoid adding any such calls.)
```
* Document that harness must not reach out to the internet.
### Acceptance criteria
* Given a small artificial dataset with 23 cases and handcrafted results, `reachbench score` produces expected metrics (assert via tests).
* Running `reachbench score --repeat 3` produces identical `scores.json` across runs.
* Missing results files are handled gracefully (but clearly documented).
---
## 5. Baseline implementations
**Goal:** Provide in-repo baselines that use only the provided graphs (no extra tooling).
### 5.1 Baseline types
1. **Naïve reachable**: all symbols in the vulnerable package are considered reachable.
2. **Imports-only**: reachable = any symbol that:
* appears in the graph AND
* is reachable from any entrypoint by a single edge OR name match.
3. **Call-depth-2**:
* From each entrypoint, traverse up to depth 2 along `call` edges.
* Anything at depth ≤ 2 is considered reachable.
### 5.2 Implementation
**File:** `harness/reachbench/baselines.py`
* `baseline_naive(graph, truth) -> PredictionModel`
* `baseline_imports_only(graph, truth) -> PredictionModel`
* `baseline_call_depth_2(graph, truth) -> PredictionModel`
**CLI:**
```bash
reachbench run-baseline \
--dataset-root ./dataset \
--baseline naive|imports|depth2 \
--out ./results-baseline-<baseline>/
```
Behavior:
* For each case:
* Load graph.
* Generate predictions per baseline.
* Write result file `results-baseline-<baseline>/<case_id>.json`.
### 5.3 Tests
* Tiny synthetic dataset in `harness/tests/data/`:
* 12 cases with simple graphs.
* Known expectations for each baseline (TP/FP/FN counts).
### Acceptance criteria
* `reachbench run-baseline --baseline naive` runs end-to-end and outputs results files.
* `reachbench score` on baseline results produces stable scores.
* Tests validate baseline behavior on synthetic cases.
---
## 6. Dataset validation & tooling
**Goal:** One command to validate everything (schemas, hashes, internal consistency).
### CLI: `validate-dataset`
```bash
reachbench validate-dataset \
--dataset-root ./dataset \
[--lockfile ./dataset/manifest.lock.json]
```
**Checks:**
1. `dataset.json` conforms to `dataset.schema.json`.
2. For each case:
* all artifact paths exist
* `graph` file passes `graph.schema.json`
* `truth` file passes `truth.schema.json`
3. Optional: verify lockfile if provided.
**Implementation:**
* `dataset_loader.py`:
* `load_dataset_index(path) -> DatasetIndex`
* `iter_cases(dataset_index)` yields case objects.
* `validate_case(case, dataset_root) -> list[str]` (list of error messages).
**Acceptance criteria**
* Broken paths / invalid JSON produce a clear error message and non-zero exit code.
* CI job calls `reachbench validate-dataset` on every push.
---
## 7. Documentation
**Goal:** Make it trivial for outsiders to use the benchmark.
### 7.1 `README.md`
* Overview:
* What the benchmark is.
* What it measures (reachability precision/recall).
* Quickstart:
```bash
git clone ...
cd stellaops-reachability-benchmark
# Validate dataset
reachbench validate-dataset --dataset-root ./dataset
# Run baselines
reachbench run-baseline --baseline naive --dataset-root ./dataset --out ./results-naive
# Score baselines
reachbench score --dataset-root ./dataset --results-root ./results-naive --out ./out/naive-scores.json
```
### 7.2 `docs/HOWTO.md`
* Step-by-step:
* Installing harness.
* Running your own tool on the dataset.
* Formatting your `results/`.
* Running `reachbench score`.
* Interpreting `scores.json`.
### 7.3 `docs/SCHEMA.md`
* Human-readable description of:
* `graph` JSON
* `truth` JSON
* `results` JSON
* `scores` JSON
* Link to actual JSON Schemas.
### 7.4 `docs/REPRODUCIBILITY.md`
* Explain:
* lockfile design
* hashing rules
* deterministic scoring and `--repeat` flag
* how to verify youre using the exact same dataset.
### 7.5 `docs/SANITIZATION.md`
* Rules for adding new cases:
* Only use OSS or properly licensed code.
* Strip secrets / proprietary paths / user data.
* How to confirm nothing sensitive is in package tarballs.
### Acceptance criteria
* A new engineer (or external user) can go from zero to “I ran the baseline and got scores” by following docs only.
* All example commands work as written.
---
## 8. CI/CD details
**Goal:** Keep repo healthy and ensure determinism.
### CI jobs (GitHub Actions)
1. **`lint`**
* Run `ruff` / `flake8` (your choice).
2. **`test`**
* Run `pytest`.
3. **`validate-dataset`**
* Run `reachbench validate-dataset --dataset-root ./dataset`.
4. **`determinism`**
* Small workflow step:
* Run `reachbench score` on a tiny test dataset with `--repeat 3`.
* Assert success.
5. **`docker-build`**
* `docker build` the harness image.
### Acceptance criteria
* All jobs green on main.
* PRs show failing status if schemas or determinism break.
---
## 9. Rough “epics → stories” breakdown
You can paste roughly like this into Jira/Linear:
1. **Epic: Repo bootstrap & CI**
* Story: Create repo skeleton & Python project
* Story: Add Dockerfile & basic CI (lint + tests)
2. **Epic: Schemas & dataset plumbing**
* Story: Implement `truth.schema.json` + tests
* Story: Implement `graph.schema.json` + tests
* Story: Implement `dataset.schema.json` + tests
* Story: Implement `validate-dataset` CLI
3. **Epic: Lockfile & determinism**
* Story: Implement lockfile computation + verification
* Story: Add `compute-lockfile` & `verify-lockfile` CLI
* Story: Add determinism checks in CI
4. **Epic: Scoring harness**
* Story: Define results format + `results.schema.json`
* Story: Implement scoring logic (`scoring.py`)
* Story: Implement `score` CLI with `--repeat`
* Story: Add unit tests for metrics
5. **Epic: Baselines**
* Story: Implement naive baseline
* Story: Implement imports-only baseline
* Story: Implement depth-2 baseline
* Story: Add `run-baseline` CLI + tests
6. **Epic: Documentation & polish**
* Story: Write README + HOWTO
* Story: Write SCHEMA / REPRODUCIBILITY / SANITIZATION docs
* Story: Final repo cleanup & examples
---
If you tell me your preferred language and CI, I can also rewrite this into exact tickets and even starter code for `cli.py` and a couple of schemas.

View File

@@ -0,0 +1,654 @@
Heres a small but highimpact product tweak: **add an immutable `graph_revision_id` to every callgraph page and API link**, so any result is citeable and reproducible across time.
---
### Why it matters (quick)
* **Auditability:** you can prove *which* graph produced a finding.
* **Reproducibility:** reruns that change paths wont “move the goalposts.”
* **Support & docs:** screenshots/links in tickets point to an exact graph state.
### What to add
* **Stable anchor in all URLs:**
`https://…/graphs/{graph_id}?rev={graph_revision_id}`
`https://…/api/graphs/{graph_id}/nodes?rev={graph_revision_id}`
* **Opaque, contentaddressed ID:** e.g., `graph_revision_id = blake3( sorted_edges + cfg + tool_versions + dataset_hashes )`.
* **Firstclass fields:** store `graph_id` (logical lineage), `graph_revision_id` (immutable), `parent_revision_id` (if derived), `created_at`, `provenance` (feed hashes, toolchain).
* **UI surfacing:** show a copybutton “Rev: 8f2d…c9” on graph pages and in the “Share” dialog.
* **Diff affordance:** when `?rev=A` and `?rev=B` are both present, offer “Compare paths (A↔B).”
### Minimal API contract (suggested)
* `GET /api/graphs/{graph_id}` → latest + `latest_revision_id`
* `GET /api/graphs/{graph_id}/revisions/{graph_revision_id}` → immutable snapshot
* `GET /api/graphs/{graph_id}/nodes?rev=…` and `/edges?rev=…`
* `POST /api/graphs/{graph_id}/pin` with `{ graph_revision_id }` to mark “official”
* HTTP `Link` header on all responses:
`Link: <…/graphs/{graph_id}/revisions/{graph_revision_id}>; rel="version"`
### How to compute the revision id (deterministic)
* Inputs (all normalized): sorted node/edge sets; build config; tool+model versions; input artifacts (SBOM/VEX/feed) **by hash**; environment knobs (feature flags).
* Serialization: canonical JSON (UTF8, ordered keys).
* Hash: BLAKE3/sha256 → base58/hex (shortened in UI, full in API).
* Store alongside a manifest (so you can replay the graph later).
### Guardrails
* **Never reuse an ID** if any input bit differs.
* **Do not** make it guessable from business data (avoid leaking repo names, paths).
* **Break glass:** if a bad graph must be purged, keep the ID tombstoned (410 Gone) so references dont silently change.
### StellaOps touches (concrete)
* **Authority**: add `GraphRevisionManifest` (feeds, lattice/policy versions, scanners, intoto/DSSE attestations).
* **Scanner/Vexer**: emit deterministic manifests and hand them to Authority for id derivation.
* **Ledger**: record `(graph_id, graph_revision_id, manifest_hash, signatures)`; expose audit query by `graph_revision_id`.
* **Docs & Support**: “Attach your `graph_revision_id`” line in issue templates.
### Tiny UX copy
* On graph page header: `Rev 8f2d…c9`**Copy****Compare****Pin**
* Share dialog: “This link freezes todays state. New runs get a different rev.”
If you want, I can draft the DB table, the manifest JSON schema, and the exact URL/router changes for your .NET 10 services next.
Cool, lets turn this into something your engineers can actually pick up and implement.
Below is a concrete implementation plan broken down by phases, services, and tickets, with suggested data models, APIs, and tests.
---
## 0. Definitions (shared across teams)
* **Graph ID (`graph_id`)** Logical identifier for a call graph lineage (e.g., “the call graph for build X of repo Y”).
* **Graph Revision ID (`graph_revision_id`)** Immutable identifier for a specific snapshot of that graph, derived from a manifest (content-addressed hash).
* **Parent Revision ID (`parent_revision_id`)** Previous revision in the lineage (if any).
* **Manifest** Canonical JSON blob that describes *everything* that could affect graph structure or results:
* Nodes & edges
* Input feeds and their hashes (SBOM, VEX, scanner output, etc.)
* config/policies/feature flags
* tool + version (scanner, vexer, authority)
---
## 1. High-Level Architecture Changes
1. **Introduce `graph_revision_id` as a first-class concept** in:
* Graph storage / Authority
* Ledger / audit
* Backend APIs serving call graphs
2. **Derive `graph_revision_id` deterministically** from a manifest via a cryptographic hash.
3. **Expose revision in all graph-related URLs & APIs**:
* UI: `…/graphs/{graph_id}?rev={graph_revision_id}`
* API: `…/api/graphs/{graph_id}/revisions/{graph_revision_id}`
4. **Ensure immutability**: once a revision exists, it can never be updated in-place—only superseded by new revisions.
---
## 2. Backend: Data Model & Storage
### 2.1. Authority (graph source of truth)
**Goal:** Model graphs and revisions explicitly.
**New / updated tables (example in SQL-ish form):**
1. **Graphs (logical entity)**
```sql
CREATE TABLE graphs (
id UUID PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
latest_revision_id VARCHAR(128) NULL, -- FK into graph_revisions.id
label TEXT NULL, -- optional human label
metadata JSONB NULL
);
```
2. **Graph Revisions (immutable snapshots)**
```sql
CREATE TABLE graph_revisions (
id VARCHAR(128) PRIMARY KEY, -- graph_revision_id (hash)
graph_id UUID NOT NULL REFERENCES graphs(id),
parent_revision_id VARCHAR(128) NULL REFERENCES graph_revisions(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
manifest JSONB NOT NULL, -- canonical manifest
provenance JSONB NOT NULL, -- tool versions, etc.
is_pinned BOOLEAN NOT NULL DEFAULT FALSE,
pinned_by UUID NULL, -- user id
pinned_at TIMESTAMPTZ NULL
);
CREATE INDEX idx_graph_revisions_graph_id ON graph_revisions(graph_id);
```
3. **Call Graph Data (if separate)**
If you store nodes/edges in separate tables, add a foreign key to `graph_revision_id`:
```sql
ALTER TABLE call_graph_nodes
ADD COLUMN graph_revision_id VARCHAR(128) NULL;
ALTER TABLE call_graph_edges
ADD COLUMN graph_revision_id VARCHAR(128) NULL;
```
> **Rule:** Nodes/edges for a revision are **never mutated**; a new revision means new rows.
---
### 2.2. Ledger (audit trail)
**Goal:** Every revision gets a ledger record for auditability.
**Table change or new table:**
```sql
CREATE TABLE graph_revision_ledger (
id BIGSERIAL PRIMARY KEY,
graph_revision_id VARCHAR(128) NOT NULL,
graph_id UUID NOT NULL,
manifest_hash VARCHAR(128) NOT NULL,
manifest_digest_algo TEXT NOT NULL, -- e.g., "BLAKE3"
authority_signature BYTEA NULL, -- optional
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_grl_revision ON graph_revision_ledger(graph_revision_id);
```
Ledger ingestion happens **after** a revision is stored in Authority, but **before** it is exposed as “current” in the UI.
---
## 3. Backend: Revision Hashing & Manifest
### 3.1. Define the manifest schema
Create a spec (e.g., JSON Schema) used by Scanner/Vexer/Authority.
**Example structure:**
```json
{
"graph": {
"graph_id": "uuid",
"generator": {
"tool_name": "scanner",
"tool_version": "1.4.2",
"run_id": "some-run-id"
}
},
"inputs": {
"sbom_hash": "sha256:…",
"vex_hash": "sha256:…",
"repos": [
{
"name": "repo-a",
"commit": "abc123",
"tree_hash": "sha1:…"
}
]
},
"config": {
"policy_version": "2024-10-01",
"feature_flags": {
"new_vex_engine": true
}
},
"graph_content": {
"nodes": [
// nodes in canonical sorted order
],
"edges": [
// edges in canonical sorted order
]
}
}
```
**Key requirements:**
* All lists that affect the graph (`nodes`, `edges`, `repos`, etc.) must be **sorted deterministically**.
* Keys must be **stable** (no environment-dependent keys, no random IDs).
* All hashes of input artifacts must be included (not raw content).
### 3.2. Hash computation
Language-agnostic algorithm:
1. Normalize manifest to **canonical JSON**:
* UTF-8
* Sorted keys
* No extra whitespace
2. Hash the bytes using a cryptographic hash (BLAKE3 or SHA-256).
3. Encode as hex or base58 string.
**Pseudocode:**
```pseudo
function compute_graph_revision_id(manifest):
canonical_json = canonical_json_encode(manifest) // sorted keys
digest_bytes = BLAKE3(canonical_json)
digest_hex = hex_encode(digest_bytes)
return "grv_" + digest_hex[0:40] // prefix + shorten for UI
```
**Ticket:** Implement `GraphRevisionIdGenerator` library (shared):
* `Compute(manifest) -> graph_revision_id`
* `ValidateFormat(graph_revision_id) -> bool`
Make this a **shared library** across Scanner, Vexer, Authority to avoid divergence.
---
## 4. Backend: APIs
### 4.1. Graphs & revisions REST API
**New endpoints (example):**
1. **Get latest graph revision**
```http
GET /api/graphs/{graph_id}
Response:
{
"graph_id": "…",
"latest_revision_id": "grv_8f2d…c9",
"created_at": "…",
"metadata": { … }
}
```
2. **List revisions for a graph**
```http
GET /api/graphs/{graph_id}/revisions
Query: ?page=1&pageSize=20
Response:
{
"graph_id": "…",
"items": [
{
"graph_revision_id": "grv_8f2d…c9",
"created_at": "…",
"parent_revision_id": null,
"is_pinned": true
},
{
"graph_revision_id": "grv_3a1b…e4",
"created_at": "…",
"parent_revision_id": "grv_8f2d…c9",
"is_pinned": false
}
]
}
```
3. **Get a specific revision (snapshot)**
```http
GET /api/graphs/{graph_id}/revisions/{graph_revision_id}
Response:
{
"graph_id": "…",
"graph_revision_id": "…",
"created_at": "…",
"parent_revision_id": null,
"manifest": { … }, // optional: maybe not full content if large
"provenance": { … }
}
```
4. **Get nodes/edges for a revision**
```http
GET /api/graphs/{graph_id}/nodes?rev={graph_revision_id}
GET /api/graphs/{graph_id}/edges?rev={graph_revision_id}
```
Behavior:
* If `rev` is **omitted**, return the **latest_revision_id** for that `graph_id`.
* If `rev` is **invalid or unknown**, return `404` (not fallback).
5. **Pin/unpin a revision (optional for v1)**
```http
POST /api/graphs/{graph_id}/pin
Body: { "graph_revision_id": "…" }
DELETE /api/graphs/{graph_id}/pin
Body: { "graph_revision_id": "…" }
```
### 4.2. Backward compatibility
* Existing endpoints like `GET /api/graphs/{graph_id}/nodes` should:
* Continue working with no `rev` param.
* Internally resolve to `latest_revision_id`.
* For old records with no revision:
* Create a synthetic manifest from current stored data.
* Compute a `graph_revision_id`.
* Store it and set `latest_revision_id` on the `graphs` row.
---
## 5. Scanner / Vexer / Upstream Pipelines
**Goal:** At the end of a graph build, they produce a manifest and a `graph_revision_id`.
### 5.1. Responsibilities
1. **Scanner/Vexer**:
* Gather:
* Tool name/version
* Input artifact hashes
* Feature flags / config
* Graph nodes/edges
* Construct manifest (according to schema).
* Compute `graph_revision_id` using shared library.
* Send manifest + revision ID to Authority via an internal API (e.g., `POST /internal/graph-build-complete`).
2. **Authority**:
* Idempotently upsert:
* `graphs` (if new `graph_id`)
* `graph_revisions` row (if `graph_revision_id` not yet present)
* nodes/edges rows keyed by `graph_revision_id`.
* Update `graphs.latest_revision_id` to the new revision.
### 5.2. Internal API (Authority)
```http
POST /internal/graphs/{graph_id}/revisions
Body:
{
"graph_revision_id": "…",
"parent_revision_id": "…", // optional
"manifest": { … },
"provenance": { … },
"nodes": [ … ],
"edges": [ … ]
}
Response: 201 Created (or 200 if idempotent)
```
**Rules:**
* If `graph_revision_id` already exists for that `graph_id` with identical `manifest_hash`, treat as **idempotent**.
* If `graph_revision_id` exists but manifest hash differs → log and reject (bug in hashing).
---
## 6. Frontend / UX Changes
Assuming a SPA (React/Vue/etc.), well treat these as tasks.
### 6.1. URL & routing
* **New canonical URL format** for graph UI:
* Latest: `/graphs/{graph_id}`
* Specific revision: `/graphs/{graph_id}?rev={graph_revision_id}`
* Router:
* Parse `rev` query param.
* If present, call `GET /api/graphs/{graph_id}/nodes?rev=…`.
* If not present, call same endpoint but without `rev` → backend returns latest.
### 6.2. Displaying revision info
* In graph page header:
* Show truncated revision:
* `Rev: 8f2d…c9`
* Buttons:
* **Copy** → Copies full `graph_revision_id`.
* **Share** → Copies full URL with `?rev=…`.
* Optional chip if pinned: `Pinned`.
**Example data model (TS):**
```ts
type GraphRevisionSummary = {
graphId: string;
graphRevisionId: string;
createdAt: string;
parentRevisionId?: string | null;
isPinned: boolean;
};
```
### 6.3. Revision list panel (optional but useful)
* Add a side panel or tab: “Revisions”.
* Fetch from `GET /api/graphs/{graph_id}/revisions`.
* Clicking a revision:
* Navigates to same page with `?rev={graph_revision_id}`.
* Preserves other UI state where reasonable.
### 6.4. Diff view (nice-to-have, can be v2)
* UX: “Compare with…” button in header.
* Opens dialog to pick a second revision.
* Backend: add a diff endpoint later, or compute diff client-side from node/edge lists if feasible.
---
## 7. Migration Plan
### 7.1. Phase 1 Schema & read-path ready
1. **Add DB columns/tables**:
* `graphs`, `graph_revisions`, `graph_revision_ledger`.
* `graph_revision_id` column to `call_graph_nodes` / `call_graph_edges`.
2. **Deploy with no behavior changes**:
* Default `graph_revision_id` columns NULL.
* Existing APIs continue to work.
### 7.2. Phase 2 Backfill existing graphs
1. Write a **backfill job**:
* For each distinct existing graph:
* Build a manifest from existing stored data.
* Compute `graph_revision_id`.
* Insert into `graphs` & `graph_revisions`.
* Update nodes/edges for that graph to set `graph_revision_id`.
* Set `graphs.latest_revision_id`.
2. Log any graphs that cant be backfilled (corrupt data, etc.) for manual review.
3. After backfill:
* Add **NOT NULL** constraint on `graph_revision_id` for nodes/edges (if practical).
* Ensure all public APIs can fetch revisions without changes from clients.
### 7.3. Phase 3 Wire up new pipelines
1. Update Scanner/Vexer to construct manifests and compute revision IDs.
2. Update Authority to accept `/internal/graphs/{graph_id}/revisions`.
3. Gradually roll out:
* Feature flag: `graphRevisionIdFromPipeline`.
* For flagged runs, use the new pipeline; for others, fall back to old + synthetic revision.
### 7.4. Phase 4 Frontend rollout
1. Update UI to:
* Read `rev` from URL (but not required).
* Show `Rev` in header.
* Use revision-aware endpoints.
2. Once stable:
* Update “Share” actions to always include `?rev=…`.
---
## 8. Testing Strategy
### 8.1. Unit tests
* **Hashing library**:
* Same manifest → same `graph_revision_id`.
* Different node ordering → same `graph_revision_id`.
* Tiny manifest change → different `graph_revision_id`.
* **Authority service**:
* Creating a revision stores `graph_revisions` + nodes/edges with matching `graph_revision_id`.
* Duplicate revision (same id + manifest) is idempotent.
* Conflicting manifest with same `graph_revision_id` is rejected.
### 8.2. Integration tests
* Scenario: “Create graph → view in UI”
* Pipeline produces manifest & revision.
* Authority persists revision.
* Ledger logs event.
* UI shows matching `graph_revision_id`.
* Scenario: “Stable permalinks”
* Capture a link with `?rev=…`.
* Rerun pipeline (new revision).
* Old link still shows original nodes/edges.
### 8.3. Migration tests
* On a sanitized snapshot:
* Run migration & backfill.
* Spot-check:
* Each `graph_id` has exactly one `latest_revision_id`.
* Node/edge counts before and after match.
* Manually recompute hash for a few graphs and compare to stored `graph_revision_id`.
---
## 9. Security & Compliance Considerations
* **Immutability guarantee**:
* Dont allow updates to `graph_revisions.manifest`.
* Any change must happen by creating a new revision.
* **Tombstoning** (for rare delete cases):
* If you must “remove” a bad graph, mark revision as `tombstoned` in an additional column and return `410 Gone` for that `graph_revision_id`.
* Never reuse that ID.
* **Access control**:
* Ensure revision APIs use the same ACLs as existing graph APIs.
* Dont leak manifests to users not allowed to see underlying artifacts.
---
## 10. Concrete Ticket Breakdown (example)
You can copy/paste this into your tracker and tweak.
1. **BE-01** Add `graphs` and `graph_revisions` tables
* AC:
* Tables exist with fields above.
* Migrations run cleanly in staging.
2. **BE-02** Add `graph_revision_id` to nodes/edges tables
* AC:
* Column added, nullable.
* No runtime errors in staging.
3. **BE-03** Implement `GraphRevisionIdGenerator` library
* AC:
* Given a manifest, returns deterministic ID.
* Unit tests cover ordering, minimal changes.
4. **BE-04** Implement `/internal/graphs/{graph_id}/revisions` in Authority
* AC:
* Stores new revision + nodes/edges.
* Idempotent on duplicate revisions.
5. **BE-05** Implement public revision APIs
* AC:
* Endpoints in §4.1 available with Swagger.
* `rev` query param supported.
* Default behavior returns latest revision.
6. **BE-06** Backfill existing graphs into `graph_revisions`
* AC:
* All existing graphs have `latest_revision_id`.
* Nodes/edges linked to a `graph_revision_id`.
* Metrics & logs generated for failures.
7. **BE-07** Ledger integration for revisions
* AC:
* Each new revision creates a ledger entry.
* Query by `graph_revision_id` works.
8. **PIPE-01** Scanner/Vexer manifest construction
* AC:
* Manifest includes all required fields.
* Values verified against Authority for a sample run.
9. **PIPE-02** Scanner/Vexer computes `graph_revision_id` and calls Authority
* AC:
* End-to-end pipeline run produces a new `graph_revision_id`.
* Authority stores it and sets as latest.
10. **FE-01** UI supports `?rev=` param and displays revision
* AC:
* When URL has `rev`, UI loads that revision.
* When no `rev`, loads latest.
* Rev appears in header with copy/share.
11. **FE-02** Revision list UI (optional)
* AC:
* Revision panel lists revisions.
* Click navigates to appropriate `?rev=`.
---
If youd like, I can next help you turn this into a very explicit design doc (with diagrams and exact JSON examples) or into ready-to-paste migration scripts / TypeScript interfaces tailored to your actual stack.

View File

@@ -0,0 +1,696 @@
Here are some key developments in the softwaresupplychain and vulnerabilityscoring world that youll want on your radar.
---
## 1. CVSS v4.0 traceable scoring with richer context
![Image](https://www.first.org/cvss/v4-0/media/699c7730c6e9a411584a129153e334f4.png)
![Image](https://www.first.org/cvss/v4-0/media/92895c8262420d32e486690aa3da9158.png)
![Image](https://orca.security/wp-content/uploads/2024/01/image-35.png?w=1149)
![Image](https://ik.imagekit.io/qualys/wp-content/uploads/2023/11/common-vulnerability-scoring-sysytem-1070x606.png)
![Image](https://www.first.org/cvss/v4-0/media/775681a717a6816a877d808132387ebe.png)
![Image](https://www.incibe.es/sites/default/files/blog/2023/cvss_v4/esquema_EN.png)
* CVSSv4.0 was officially released by FIRST (Forum of Incident Response & Security Teams) on **November1,2023**. ([first.org][1])
* The specification now clearly divides metrics into four groups: Base, Threat, Environmental, and Supplemental. ([first.org][1])
* The National Vulnerability Database (NVD) has added support for CVSSv4.0 — meaning newer vulnerability records can carry v4style scores, vector strings and search filters. ([NVD][2])
* Whats new/tangible: better granularity, explicit “Attack Requirements” and richer metadata to better reflect realworld contextual risk. ([Seemplicity][3])
* Why this matters: Enables more traceable evidence of how a score was derived (which metrics used, what context), supporting auditing, prioritisation and transparency.
**Takeaway for your world**: If youre leveraging vulnerability scanning, SBOM enrichment or compliance workflows (given your interest in SBOM/VEX/provenance), then moving to or supporting CVSSv4.0 ensures you have stronger traceability and richer scoring context that maps into policy, audit and remediation workflows.
---
## 2. CycloneDX v1.7 SBOM/VEX/provenance with cryptographic & IP transparency
![Image](https://media.licdn.com/dms/image/sync/v2/D5627AQEQOCURRF5KKA/articleshare-shrink_800/B56ZoHZJ8vJ8AI-/0/1761060627060?e=2147483647\&t=FRlRJg1uubjtZlxPbks-Xd94o4aDWy841V7vjclWBoQ\&v=beta)
![Image](https://cyclonedx.org/images/guides/OWASP_CycloneDX-Authoritative-Guide-to-CBOM-en.png)
![Image](https://cyclonedx.org/images/CycloneDX-Social-Card.png?ts=167332841195327)
![Image](https://sbom.observer/academy/img/cyclonedx-model.svg)
![Image](https://devsec-blog.com/wp-content/uploads/2024/03/1_vgsHYhpBnkMTrXtnYY9LFA-14.webp)
![Image](https://media2.dev.to/dynamic/image/width%3D800%2Cheight%3D%2Cfit%3Dscale-down%2Cgravity%3Dauto%2Cformat%3Dauto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dtx4jjnx4m4oba67efz.png)
* Version1.7 of the SBOM standard from OWASP Foundation (CycloneDX) launched on **October21,2025**. ([CycloneDX][4])
* Key enhancements: *Cryptography Bill of Materials (CBOM)* support (listing algorithm families, elliptic curves, etc) and *structured citations* (who provided component info, how, when) to improve provenance. ([CycloneDX][4])
* Provenance usecases: The spec enables declaring supplier/author/publisher metadata, component origin, external references. ([CycloneDX][5])
* Broadening scope: CycloneDX now supports not just SBOM (software), but hardware BOMs (HBOM), machine learning BOMs, cryptographic BOMs (CBOM) and supports VEX/attestation usecases. ([openssf.org][6])
* Why this matters: For your StellaOps architecture (with a strong emphasis on provenance, deterministic scans, trustframeworks) CycloneDXv1.7 provides native standard support for deeper auditready evidence, cryptographic algorithm visibility (which matters for cryptosovereign readiness) and formal attestations/citations in the BOM.
**Takeaway**: Aligning your SBOM/VEX/provenance stack (e.g., scanner.webservice) to output CycloneDXv1.7compliant artifacts means you jump ahead in terms of traceability, auditability and futureproofing (crypto and IP).
---
## 3. SLSA v1.2 Release Candidate2 supplychain build provenance standard
![Image](https://slsa.dev/spec/draft/images/provenance-model.svg)
![Image](https://slsa.dev/spec/draft/images/build-env-model.svg)
![Image](https://www.legitsecurity.com/hs-fs/hubfs/Screen%20Shot%202023-05-03%20at%203.38.49%20PM.png?height=742\&name=Screen+Shot+2023-05-03+at+3.38.49+PM.png\&width=912)
![Image](https://miro.medium.com/v2/resize%3Afit%3A1400/0%2Ac2z_UhJeNJrglUMy)
![Image](https://pradeepl.com/blog/slsa/images/SLSA-Pradeep-Loganathan.png)
![Image](https://miro.medium.com/1%2AszH2l3El8agHp1sS3rAp_A.jpeg)
* On **November10,2025**, the Open Source Security Foundation (via the SLSA community) announced RC2 of SLSAv1.2, open for public comment until November24,2025. ([SLSA][7])
* Whats new: Introduction of a *Source Track* (in addition to the Build Track) to capture source control provenance, distributed provenance, artifact attestations. ([SLSA][7])
* Specification clarifies provenance/attestation formats, how builds should be produced, distributed, verified. ([SLSA][8])
* Why this matters: SLSA gives you a standard framework for “I can trace this binary back to the code, the build system, the signer, the provenance chain,” which aligns directly with your strategic moats around deterministic replayable scans, proofofintegrity graph, and attestations.
**Takeaway**: If you integrate SLSAv1.2 (once finalised) into StellaOps, you gain an industryrecognised standard for build provenance and attestation, complementing your SBOM/VEX and CVSS code bases.
---
### Why Im sharing this with you
Given your interest in cryptographicsovereign readiness, deterministic scanning, provenance and auditgrade supplychain tooling (your StellaOps moat list), this trifecta (CVSSv4.0 + CycloneDXv1.7 + SLSAv1.2) represents the major standards you need to converge on. They each address different layers: vulnerability scoring, component provenance and build/trust chain assurance. Aligning all three will give you a strong governance and tooling stack.
If you like, I can pull together a detailed gapanalysis table (your current architecture versus what these standards demand) and propose roadmap steps for StellaOps to adopt them.
[1]: https://www.first.org/cvss/specification-document?utm_source=chatgpt.com "CVSS v4.0 Specification Document"
[2]: https://nvd.nist.gov/general/news/cvss-v4-0-official-support?utm_source=chatgpt.com "CVSS v4.0 Official Support - NVD"
[3]: https://seemplicity.io/blog/decoding-cvss-4-clarified-base-metrics/?utm_source=chatgpt.com "Decoding CVSS 4.0: Clarified Base Metrics"
[4]: https://cyclonedx.org/news/cyclonedx-v1.7-released/?utm_source=chatgpt.com "CycloneDX v1.7 Delivers Advanced Cryptography, ..."
[5]: https://cyclonedx.org/use-cases/provenance/?utm_source=chatgpt.com "Security Use Case: Provenance"
[6]: https://openssf.org/blog/2025/10/22/sboms-in-the-era-of-the-cra-toward-a-unified-and-actionable-framework/?utm_source=chatgpt.com "Global Alignment on SBOM Standards: How the EU Cyber ..."
[7]: https://slsa.dev/blog/2025/11/slsa-v1.2-rc2?utm_source=chatgpt.com "Announcing SLSA v1.2 Release Candidate 2"
[8]: https://slsa.dev/spec/v1.2-rc2/?utm_source=chatgpt.com "SLSA specification"
Cool, lets turn all that standards talk into something your engineers can actually build against.
Below is a concrete implementation plan, broken into 3 workstreams, each with phases, tasks and clear acceptance criteria:
* **A — CVSS v4.0 integration (scoring & evidence)**
* **B — CycloneDX 1.7 SBOM/CBOM + provenance**
* **C — SLSA 1.2 (build + source provenance)**
* **X — Crosscutting (APIs, UX, docs, rollout)**
Ill assume you have:
* A scanner / ingestion pipeline,
* A central data model (DB or graph),
* An API + UI layer (StellaOps console or similar),
* CI/CD on GitHub/GitLab/whatever.
---
## A. CVSS v4.0 integration
**Goal:** Your platform can ingest, calculate, store and expose CVSS v4.0 scores and vectors alongside (or instead of) v3.x, using the official FIRST spec and NVD data. ([FIRST][1])
### A1. Foundations & decisions
**Tasks**
1. **Pick canonical CVSSv4 library or implementation**
* Evaluate existing OSS libraries for your main language(s), or plan an internal one based directly on FIRSTs spec (Base, Threat, Environmental, Supplemental groups).
* Decide:
* Supported metric groups (Base only vs. Base+Threat+Environmental+Supplemental).
* Which groups your UI will expose/edit vs. read-only from upstream feeds.
2. **Versioning strategy**
* Decide how to represent CVSS v3.0/v3.1/v4.0 in your DB:
* `vulnerability_scores` table with `version`, `vector`, `base_score`, `environmental_score`, `temporal_score`, `severity_band`.
* Define precedence rules: if both v3.1 and v4.0 exist, which one your “headline” severity uses.
**Acceptance criteria**
* Tech design doc reviewed & approved.
* Decision on library vs. custom implementation recorded.
* DB schema migration plan ready.
---
### A2. Data model & storage
**Tasks**
1. **DB schema changes**
* Add a `cvss_scores` table or expand the existing vulnerability table, e.g.:
```text
cvss_scores
id (PK)
vuln_id (FK)
source (enum: NVD, scanner, manual)
version (enum: 2.0, 3.0, 3.1, 4.0)
vector (string)
base_score (float)
temporal_score (float, nullable)
environmental_score (float, nullable)
severity (enum: NONE/LOW/MEDIUM/HIGH/CRITICAL)
metrics_json (JSONB) // raw metrics for traceability
created_at / updated_at
```
2. **Traceable evidence**
* Store:
* Raw CVSS vector string (e.g. `CVSS:4.0/AV:N/...(etc)`).
* Parsed metrics as JSON for audit (show “why” a score is what it is).
* Optional: add `calculated_by` + `calculated_at` for your internal scoring runs.
**Acceptance criteria**
* Migrations applied in dev.
* Read/write repository functions implemented and unittested.
---
### A3. Ingestion & calculation
**Tasks**
1. **NVD / external feeds**
* Update your NVD ingestion to read CVSS v4.0 when present in JSON `metrics` fields. ([NVD][2])
* Map NVD → internal `cvss_scores` model.
2. **Local CVSSv4 calculator service**
* Implement a service (or module) that:
* Accepts metric values (Base/Threat/Environmental/Supplemental).
* Produces:
* Canonical vector.
* Base/Threat/Environmental scores.
* Severity band.
* Make this callable by:
* Scanner engine (calculating scores for private vulns).
* UI (recalculate button).
* API (for automated clients).
**Acceptance criteria**
* Given a set of reference vectors from FIRST, your calculator returns exact expected scores.
* NVD ingestion for a sample of CVEs produces v4 scores in your DB.
---
### A4. UI & API
**Tasks**
1. **API**
* Extend vulnerability API payload with:
```json
{
"id": "CVE-2024-XXXX",
"cvss": [
{
"version": "4.0",
"source": "NVD",
"vector": "CVSS:4.0/AV:N/...",
"base_score": 8.3,
"severity": "HIGH",
"metrics": { "...": "..." }
}
]
}
```
* Add filters: `cvss.version`, `cvss.min_score`, `cvss.severity`.
2. **UI**
* On vulnerability detail:
* Show v3.x and v4.0 side-by-side.
* Expandable panel with metric breakdown and “explain my score” text.
* On list views:
* Support sorting & filtering by v4.0 base score & severity.
**Acceptance criteria**
* Frontend can render v4.0 vectors and scores.
* QA can filter vulnerabilities using v4 metrics via API and UI.
---
### A5. Migration & rollout
**Tasks**
1. **Backfill**
* For all stored vulnerabilities where metrics exist:
* If v4 not present but inputs available, compute v4.
* Store both historical (v3.x) and new v4 for comparison.
2. **Feature flag / rollout**
* Introduce feature flag `cvss_v4_enabled` per tenant or environment.
* Run A/B comparison internally before enabling for all users.
**Acceptance criteria**
* Backfill job runs successfully on staging data.
* Rollout plan + rollback strategy documented.
---
## B. CycloneDX 1.7 SBOM/CBOM + provenance
CycloneDX 1.7 is now the current spec; it adds things like a Cryptography BOM (CBOM) and structured citations/provenance to strengthen trust and traceability. ([CycloneDX][3])
### B1. Decide scope & generators
**Tasks**
1. **Select BOM formats & languages**
* JSON as your primary format (`application/vnd.cyclonedx+json`). ([CycloneDX][4])
* Components youll cover:
* Application BOMs (packages, containers).
* Optional: infrastructure (IaC, images).
* Optional: CBOM for crypto usage.
2. **Choose or implement generators**
* For each ecosystem (e.g., Maven, NPM, PyPI, containers), choose:
* Existing tools (`cyclonedx-maven-plugin`, `cyclonedx-npm`, etc).
* Or central generator using lockfiles/manifests.
**Acceptance criteria**
* Matrix of ecosystems → generator tool finalized.
* POC shows valid CycloneDX 1.7 JSON BOM for one representative project.
---
### B2. Schema alignment & validation
**Tasks**
1. **Model updates**
* Extend your internal SBOM model to include:
* `spec_version: "1.7"`
* `bomFormat: "CycloneDX"`
* `serialNumber` (UUID/URI).
* `metadata.tools` (how BOM was produced).
* `properties`, `licenses`, `crypto` (for CBOM).
* For provenance:
* `metadata.authors`, `metadata.manufacture`, `metadata.supplier`.
* `components[x].evidence` and `components[x].properties` for evidence & citations. ([CycloneDX][5])
2. **Validation pipeline**
* Integrate the official CycloneDX JSON schema validation step into:
* CI (for projects generating BOMs).
* Your ingestion path (reject/flag invalid BOMs).
**Acceptance criteria**
* Any BOM produced must pass CycloneDX 1.7 JSON schema validation in CI.
* Ingestion rejects malformed BOMs with clear error messages.
---
### B3. Provenance & citations in BOMs
**Tasks**
1. **Define provenance policy**
* Minimal set for every BOM:
* Author (CI system / team).
* Build pipeline ID, commit, repo URL.
* Build time.
* Extended:
* `externalReferences` for:
* Build logs.
* SLSA attestations.
* Security reports (e.g., scanner runs).
2. **Implement metadata injection**
* In your CI templates:
* Capture build info (commit SHA, pipeline ID, creator, environment).
* Add it into CycloneDX `metadata` and `properties`.
* For evidence:
* Use `components[x].evidence` to reference where a component was detected (e.g., file paths, manifest lines).
**Acceptance criteria**
* For any BOM, engineers can trace:
* WHO built it.
* WHEN it was built.
* WHICH repo/commit/pipeline it came from.
---
### B4. CBOM (Cryptography BOM) support (optional but powerful)
**Tasks**
1. **Crypto inventory**
* Scanner enhancement:
* Detect crypto libraries & primitives used (e.g., OpenSSL, bcrypt, TLS versions).
* Map them into CycloneDX CBOM structures in `crypto` sections (per spec).
2. **Policy hooks**
* Define policy checks:
* “Disallow SHA-1,”
* “Warn on RSA < 2048 bits,”
* “Flag non-FIPS-approved algorithms.”
**Acceptance criteria**
* From a BOM, you can list all cryptographic algorithms and libraries used in an application.
* At least one simple crypto policy implemented (e.g., SHA-1 usage alert).
---
### B5. Ingestion, correlation & UI
**Tasks**
1. **Ingestion service**
* API endpoint: `POST /sboms` accepting CycloneDX 1.7 JSON.
* Store:
* Raw BOM (for evidence).
* Normalized component graph (packages, relationships).
* Link BOM to:
* Repo/project.
* Build (from SLSA provenance).
* Deployed asset.
2. **Correlation**
* Join SBOM components with:
* Vulnerability data (CVE/CWE/CPE/PURL).
* Crypto policy results.
* Maintain “asset → BOM → components → vulnerabilities” graph.
3. **UI**
* For any service/image:
* Show latest BOM metadata (CycloneDX version, timestamp).
* Component list with vulnerability badges.
* Crypto tab (if CBOM enabled).
* Provenance tab (author, build pipeline, SLSA attestation links).
**Acceptance criteria**
* Given an SBOM upload, the UI shows:
* Components.
* Associated vulnerabilities.
* Provenance metadata.
* API consumers can fetch SBOM + correlated risk in a single call.
---
## C. SLSA 1.2 build + source provenance
SLSA 1.2 (final) introduces a **Source Track** in addition to the Build Track, defining levels and attestation formats for both source control and build provenance. ([SLSA][6])
### C1. Target SLSA levels & scope
**Tasks**
1. **Choose target levels**
* For each critical product:
* Pick Build Track level (e.g., target L2 now, L3 later).
* Pick Source Track level (e.g., L1 for all, L2 for sensitive repos).
2. **Repo inventory**
* Classify repos by risk:
* Critical (agents, scanners, control-plane).
* Important (integrations).
* Lowrisk (internal tools).
* Map target SLSA levels accordingly.
**Acceptance criteria**
* For every repo, there is an explicit target SLSA Build + Source level.
* Gap analysis doc exists (current vs target).
---
### C2. Build provenance in CI/CD
**Tasks**
1. **Attestation generation**
* For each CI pipeline:
* Use SLSA-compatible builders or tooling (e.g., `slsa-github-generator`, `slsa-framework` actions, Tekton Chains, etc.) to produce **build provenance attestations** in SLSA 1.2 format.
* Attestation content includes:
* Builder identity.
* Build inputs (commit, repo, config).
* Build parameters.
* Produced artifacts (digest, image tags).
2. **Signing & storage**
* Sign attestations (Sigstore/cosign or equivalent).
* Store:
* In an OCI registry (as artifacts).
* Or in a dedicated provenance store.
* Expose pointer to attestation in:
* BOM (`externalReferences`).
* Your StellaOps metadata.
**Acceptance criteria**
* For any built artifact (image/binary), you can retrieve a SLSA attestation proving:
* What source it came from.
* Which builder ran.
* What steps were executed.
---
### C3. Source Track controls
**Tasks**
1. **Source provenance**
* Implement controls to support SLSA Source Track:
* Enforce protected branches.
* Require code review (e.g., 2 reviewers) for main branches.
* Require signed commits for critical repos.
* Log:
* Author, reviewers, branch, PR ID, merge SHA.
2. **Source attestation**
* For each release:
* Generate **source attestations** capturing:
* Repo URL and commit.
* Review status.
* Policy compliance (review count, checks passing).
* Link these to build attestations (Source → Build provenance chain).
**Acceptance criteria**
* For a release, you can prove:
* Which reviews happened.
* Which branch strategy was followed.
* That policies were met at merge time.
---
### C4. Verification & policy in StellaOps
**Tasks**
1. **Verifier service**
* Implement a service that:
* Fetches SLSA attestations (source + build).
* Verifies signatures and integrity.
* Evaluates them against policies:
* “Artifact must have SLSA Build L2 attestation from trusted builders.”
* “Critical services must have Source L2 attestation (review, branch protections).”
2. **Runtime & deployment gates**
* Integrate verification into:
* Admission controller (Kubernetes or deployment gate).
* CI release stage (block promotion if SLSA requirements not met).
3. **UI**
* On artifact/service detail page:
* Surface SLSA level achieved (per track).
* Status (pass/fail).
* Drill-down view of attestation evidence (who built, when, from where).
**Acceptance criteria**
* A deployment can be blocked (in a test env) when SLSA requirements are not satisfied.
* Operators can visually see SLSA status for an artifact/service.
---
## X. Crosscutting: APIs, UX, docs, rollout
### X1. Unified data model & APIs
**Tasks**
1. **Graph relationships**
* Model the relationship:
* **Source repo** → **SLSA Source attestation**
→ **Build attestation** → **Artifact**
→ **SBOM (CycloneDX 1.7)** → **Components**
→ **Vulnerabilities (CVSS v4)**.
2. **Graph queries**
* Build API endpoints for:
* “Given a CVE, show all affected artifacts and their SLSA + BOM evidence.”
* “Given an artifact, show its full provenance chain and risk posture.”
**Acceptance criteria**
* At least 2 endtoend queries work:
* CVE → impacted assets with scores + provenance.
* Artifact → SBOM + vulnerabilities + SLSA + crypto posture.
---
### X2. Observability & auditing
**Tasks**
1. **Audit logs**
* Log:
* BOM uploads and generators.
* SLSA attestation creation/verification.
* CVSS recalculations (who/what triggered them).
2. **Metrics**
* Track:
* % of builds with valid SLSA attestations.
* % artifacts with CycloneDX 1.7 BOMs.
* % vulns with v4 scores.
* Expose dashboards (Prometheus/Grafana or similar).
**Acceptance criteria**
* Dashboards exist showing coverage for:
* CVSSv4 adoption.
* CycloneDX 1.7 coverage.
* SLSA coverage.
---
### X3. Documentation & developer experience
**Tasks**
1. **Developer playbooks**
* Short, repofriendly docs:
* “How to enable CycloneDX BOM generation in this repo.”
* “How to ensure your service reaches SLSA Build L2.”
* “How to interpret CVSS v4 in StellaOps.”
2. **Templates**
* CI templates:
* `bom-enabled-pipeline.yaml`
* `slsa-enabled-pipeline.yaml`
* Code snippets:
* API examples for pushing SBOMs.
* API examples for querying risk posture.
**Acceptance criteria**
* A new project can:
* Copy a CI template.
* Produce a validated CycloneDX 1.7 BOM.
* Generate SLSA attestations.
* Show up correctly in StellaOps with CVSS v4 scoring.
---
If youd like, next step I can:
* Turn this into a **Jira-ready epic + stories breakdown**, or
* Draft concrete **API schemas (OpenAPI/JSON)** for SBOM ingestion, CVSS scoring, and SLSA attestation verification.
[1]: https://www.first.org/cvss/specification-document?utm_source=chatgpt.com "CVSS v4.0 Specification Document"
[2]: https://nvd.nist.gov/general/news/cvss-v4-0-official-support?utm_source=chatgpt.com "CVSS v4.0 Official Support - NVD"
[3]: https://cyclonedx.org/news/cyclonedx-v1.7-released/?utm_source=chatgpt.com "CycloneDX v1.7 Delivers Advanced Cryptography, ..."
[4]: https://cyclonedx.org/specification/overview/?utm_source=chatgpt.com "Specification Overview"
[5]: https://cyclonedx.org/docs/latest?utm_source=chatgpt.com "CycloneDX v1.7 JSON Reference"
[6]: https://slsa.dev/spec/v1.2/?utm_source=chatgpt.com "SLSA specification"

View File

@@ -0,0 +1,19 @@
Heres a quick sizing rule of thumb for Sigstore attestations so you dont hit Rekor limits.
* **Base64 bloat:** DSSE wraps your JSON statement and then Base64encodes it. Base64 turns every 3 bytes into 4, so size ≈ `ceil(P/3)*4` (about **+3337%** on top of your raw JSON). ([Stack Overflow][1])
* **DSSE envelope fields:** Expect a small extra overhead for JSON keys like `payloadType`, `payload`, and `signatures` (and the signature itself). Sigstores bundle/DSSE examples show the structure used. ([Sigstore][2])
* **Public Rekor cap:** The **public Rekor instance rejects uploads over 100KB**. If your DSSE (after Base64 + JSON fields) exceeds that, shard/split the attestation or run your own Rekor. ([GitHub][3])
* **Reality check:** Teams routinely run into size errors when large statements are uploaded—the whole DSSE payload is sent to Rekor during verification/ingest. ([GitHub][4])
### Practical guidance
* Keep a **single attestation well under ~7080KB raw JSON** if it will be wrapped+Base64d (gives headroom for signatures/keys).
* Prefer **compact JSON** (no whitespace), **short key names**, and **avoid huge embedded fields** (e.g., trim SBOM evidence or link it by digest/URI).
* For big evidence sets, publish **multiple attestations** (logical shards) or **selfhost Rekor**. ([GitHub][3])
If you want, I can add a tiny calculator snippet that takes your payload bytes and estimates the final DSSE+Base64 size vs. the 100KB limit.
[1]: https://stackoverflow.com/questions/4715415/base64-what-is-the-worst-possible-increase-in-space-usage?utm_source=chatgpt.com "Base64: What is the worst possible increase in space usage?"
[2]: https://docs.sigstore.dev/about/bundle/?utm_source=chatgpt.com "Sigstore Bundle Format"
[3]: https://github.com/sigstore/rekor?utm_source=chatgpt.com "sigstore/rekor: Software Supply Chain Transparency Log"
[4]: https://github.com/sigstore/cosign/issues/3599?utm_source=chatgpt.com "Attestations require uploading entire payload to rekor #3599"

View File

@@ -0,0 +1,913 @@
Heres a clear, SBOMfirst blueprint you can drop into StellaOps without extra context.
---
# SBOMfirst spine (with attestations) — the short, practical version
![High-level flow](https://dummyimage.com/1200x300/ffffff/000000.png\&text=Scanner+→+Sbomer+→+Authority+→+Graphs+/%20APIs)
## Why this matters (plain English)
* **SBOMs** (CycloneDX/SPDX) = a complete parts list of your software.
* **Attestations** (intoto + DSSE) = tamperevident receipts proving *who did what, to which artifact, when, and how*.
* **Determinism** = if you rescan tomorrow, you get the same result for the same inputs.
* **Explainability** = every risk decision links back to evidence you can show to auditors/customers.
---
## Core pipeline (modules & responsibilities)
1. **Scan (Scanner)**
* Inputs: container image / dir / repo.
* Outputs: raw facts (packages, files, symbols), and a **ScanEvidence** attestation (DSSEwrapped intoto statement).
* Must support offline feeds (bundle CVE/NVD/OSV/vendor advisories).
2. **Sbomer**
* Normalizes raw facts → **canonical SBOM** (CycloneDX or SPDX) with:
* PURLs, license info, checksums, buildIDs (ELF/PE/MachO), source locations.
* Emits **SBOMProduced** attestation linking SBOM ↔ image digest.
3. **Authority**
* Verifies every attestation chain (Sigstore/keys; PQ-ready option later).
* Stamps **PolicyVerified** attestation (who approved, policy hash, inputs).
* Persists **trustlog**: signatures, cert chains, Rekorlike index (mirrorable offline).
4. **Graph Store (Canonical Graph)**
* Ingests SBOM, vulnerabilities, reachability facts, VEX statements.
* Preserves **evidence links** (edge predicates: “foundby”, “reachablevia”, “provenby”).
* Enables **deterministic replay** (snapshot manifests: feeds+rules+hashes).
---
## Stable APIs (keep these boundaries sharp)
* **/scan** → start scan; returns Evidence ID + attestation ref.
* **/sbom** → get canonical SBOM (by image digest or Evidence ID).
* **/attest** → submit/fetch attestations; verify chain; returns trustproof.
* **/vexgate** → policy decision: *allow / warn / block* with proof bundle.
* **/diff** → SBOM↔SBOM + SBOM↔runtime diffs (see below).
* **/unknowns** → create/list/resolve Unknowns (signals needing human/vendor input).
Design notes:
* All responses include `decision`, `explanation`, `evidence[]`, `hashes`, `clock`.
* Support **airgap**: all endpoints operate on local bundles (ZIP/TAR with SBOM+attestations+feeds).
---
## Determinism & “Unknowns” (noisekiller loop)
**Smart diffs**
* **SBOM↔SBOM**: detect added/removed/changed components (by PURL+version+hash).
* **SBOM↔runtime**: prove reachability (e.g., symbol/function use, loaded libs, process maps).
* Score only on **provable** paths; gate on **VEX** (vendor/exploitability statements).
**Unknowns handler**
* Any unresolved signal (ambiguous CVE mapping, stripped binary, unverified vendor VEX) → **Unknowns** queue:
* SLA, owner, evidence snapshot, audit trail.
* State machine: `new → triage → vendorquery → verified → closed`.
* Every VEX or vendor reply becomes an attestation; decisions reevaluated deterministically.
---
## What to store (so you can explain every decision)
* **Artifacts**: image digest, SBOM hash, feed versions, rule set hash.
* **Proofs**: DSSE envelopes, signatures, certs, inclusion proofs (Rekorstyle).
* **Predicates (edges)**:
* `contains(component)`, `vulnerable_to(cve)`, `reachable_via(callgraph|runtime)`,
* `overridden_by(vex)`, `verified_by(authority)`, `derived_from(scan-evidence)`.
* **Whystrings**: humanreadable proof trails (13 sentences) output with every decision.
---
## Minimal policies that work on day 1
* **Block** only when: `vuln.severity ≥ High` AND `reachable == true` AND `no VEX allows`.
* **Warn** when: `High/Critical` but `reachable == unknown` → route to Unknowns with SLA.
* **Allow** when: `Low/Medium` OR VEX says `not_affected` (trusted signer + policy).
---
## Offline/airgap bundle format (zip)
```
/bundle/
feeds/ (NVD, OSV, vendor) + manifest.json (hashes, timestamps)
sboms/ imageDigest.json
attestations/ *.jsonl (DSSE)
proofs/ rekor/ merkle.json
policy/ lattice.json
replay/ inputs.lock (contenthashes of everything above)
```
* Every API accepts `?bundle=/path/to/bundle.zip`.
* **Replay**: `inputs.lock` guarantees deterministic reevaluation.
---
## .NET 10 implementation sketch (pragmatic)
* **Contracts**: `StellaOps.Contracts.*` (Scan, Attest, VexGate, Diff, Unknowns).
* **Attestations**: `StellaOps.Attest.Dsse` (IEnvelope, IStatement<TPredicate>); pluggable crypto (FIPS/GOST/SM/PQ).
* **SBOM**: `StellaOps.Sbom` (CycloneDX/SPDX models + mappers; PURL utilities).
* **Graph**: `StellaOps.Graph` (EF Core 9/10 over Mongo/Postgres; edge predicates as enums + JSON evidence).
* **Policy/Lattice**: `StellaOps.Policy.Lattice` (pure functions over graph snapshots; produce Decision+Why).
* **Unknowns**: `StellaOps.Unknowns` (aggregate root; SLA timers; audit events).
* **CLI**: `stella scan|sbom|attest|vex-gate|diff|unknowns` with `--bundle`.
Key patterns:
* All public methods return `(Result<T> value, Proof proof)`.
* Hash everything (inputs, rules, feeds) and bake into `Proof`.
* Emit DSSE envelopes at each hop; verify on ingest; never trust unauthenticated input.
---
## Next steps (weekone checklist)
* Define JSON schemas for: **Decision** (with Why+Proof), **Unknown**, **DiffResult**.
* Stand up **/scan**, **/sbom**, **/attest**, **/vexgate** skeletons returning mock Proofs.
* Implement **SBOM↔SBOM diff** (PURL+version+hash) and wire to **/diff**.
* Create **bundle.zip** reader/writer + `inputs.lock`.
* Add **Authority.Verify()** with DSSE verification + truststore.
* Ship a basic **policy.lattice.json** (3 rules above) and e2e test: image → allow/block with proofs.
If you want, I can generate:
* The exact C# interfaces/classes (namespaces + DTOs),
* JSON Schemas for Decision/Proof/Unknowns,
* A seed `bundle.zip` layout with fake data for local e2e.
Cool, lets turn that blueprint into something your devs can actually build from.
Below is a detailed, opinionated implementation plan you can paste into an engineering doc and assign as epics / tickets.
---
## 0. Solution layout (concrete)
**Repo structure**
```text
/stella-ops
/src
StellaOps.Contracts // DTOs, API contracts, JSON schemas
StellaOps.Domain // Core domain types (ArtifactId, Proof, Decision, etc.)
StellaOps.Attest // DSSE envelopes, in-toto statements, signing/verification
StellaOps.Sbom // SBOM models + normalization
StellaOps.Graph // Graph store, entities, queries
StellaOps.Policy // Policy engine (lattice evaluation)
StellaOps.WebApi // HTTP APIs: /scan, /sbom, /attest, /vex-gate, /diff, /unknowns
StellaOps.Cli // `stella` CLI, offline bundles
/tests
StellaOps.Tests.Unit
StellaOps.Tests.Integration
StellaOps.Tests.E2E
```
**Baseline tech assumptions**
* Runtime: .NET (8+; you can call it “.NET 10” in your roadmap).
* API: ASP.NET Core minimal APIs.
* DB: Postgres (via EF Core) for graph + unknowns + metadata.
* Storage: local filesystem / S3-compatible for bundle zips, scanner DB caches.
* External scanners: Trivy / Grype / Syft (invoked via CLI with deterministic config).
---
## 1. Core domain & shared contracts (Phase 1)
**Goal:** Have a stable core domain + contracts that all teams can build against.
### 1.1 Core domain types (`StellaOps.Domain`)
Implement:
```csharp
public readonly record struct Digest(string Algorithm, string Value); // e.g. ("sha256", "abcd...")
public readonly record struct ArtifactRef(string Kind, string Value);
// Kind: "container-image", "file", "package", "sbom", etc.
public readonly record struct EvidenceId(Guid Value);
public readonly record struct AttestationId(Guid Value);
public enum PredicateType
{
ScanEvidence,
SbomProduced,
PolicyVerified,
VulnerabilityFinding,
ReachabilityFinding,
VexStatement
}
public sealed class Proof
{
public string ProofId { get; init; } = default!;
public Digest InputsLock { get; init; } = default!; // hash of feeds+rules+sbom bundle
public DateTimeOffset EvaluatedAt { get; init; }
public IReadOnlyList<string> EvidenceIds { get; init; } = Array.Empty<string>();
public IReadOnlyDictionary<string,string> Meta { get; init; } = new Dictionary<string,string>();
}
```
### 1.2 Attestation model (`StellaOps.Attest`)
Implement DSSE + intoto abstractions:
```csharp
public sealed class DsseEnvelope
{
public string PayloadType { get; init; } = default!;
public string Payload { get; init; } = default!; // base64url(JSON)
public IReadOnlyList<DsseSignature> Signatures { get; init; } = Array.Empty<DsseSignature>();
}
public sealed class DsseSignature
{
public string KeyId { get; init; } = default!;
public string Sig { get; init; } = default!; // base64url
}
public interface IStatement<out TPredicate>
{
string Type { get; } // in-toto type URI
string PredicateType { get; } // URI or enum -> string
TPredicate Predicate { get; }
string Subject { get; } // e.g., image digest
}
```
Attestation services:
```csharp
public interface IAttestationSigner
{
Task<DsseEnvelope> SignAsync<TPredicate>(IStatement<TPredicate> statement, CancellationToken ct);
}
public interface IAttestationVerifier
{
Task VerifyAsync(DsseEnvelope envelope, CancellationToken ct);
}
```
### 1.3 Decision & VEX-gate contracts (`StellaOps.Contracts`)
```csharp
public enum GateDecisionKind
{
Allow,
Warn,
Block
}
public sealed class GateDecision
{
public GateDecisionKind Decision { get; init; }
public string Reason { get; init; } = default!; // short human-readable
public Proof Proof { get; init; } = default!;
public IReadOnlyList<string> Evidence { get; init; } = Array.Empty<string>(); // EvidenceIds / AttestationIds
}
public sealed class VexGateRequest
{
public ArtifactRef Artifact { get; init; }
public string? Environment { get; init; } // "prod", "staging", cluster id, etc.
public string? BundlePath { get; init; } // optional offline bundle path
}
```
**Acceptance criteria**
* Shared projects compile.
* No service references each other directly (only via Contracts + Domain).
* Example test that serializes/deserializes GateDecision and DsseEnvelope using System.Text.Json.
---
## 2. SBOM pipeline (Scanner → Sbomer) (Phase 2)
**Goal:** For a container image, produce a canonical SBOM + attestation deterministically.
### 2.1 Scanner integration (`StellaOps.WebApi` + `StellaOps.Cli`)
#### API contract (`/scan`)
```csharp
public sealed class ScanRequest
{
public string SourceType { get; init; } = default!; // "container-image" | "directory" | "git-repo"
public string Locator { get; init; } = default!; // e.g. "registry/myapp:1.2.3"
public bool IncludeFiles { get; init; } = true;
public bool IncludeLicenses { get; init; } = true;
public string? BundlePath { get; init; } // for offline data
}
public sealed class ScanResponse
{
public EvidenceId EvidenceId { get; init; }
public AttestationId AttestationId { get; init; }
public Digest ArtifactDigest { get; init; } = default!;
}
```
#### Implementation steps
1. **Scanner abstraction**
```csharp
public interface IArtifactScanner
{
Task<ScanResult> ScanAsync(ScanRequest request, CancellationToken ct);
}
public sealed class ScanResult
{
public ArtifactRef Artifact { get; init; } = default!;
public Digest ArtifactDigest { get; init; } = default!;
public IReadOnlyList<DiscoveredPackage> Packages { get; init; } = Array.Empty<DiscoveredPackage>();
public IReadOnlyList<DiscoveredFile> Files { get; init; } = Array.Empty<DiscoveredFile>();
}
```
2. **CLI wrapper** (Trivy/Grype/Syft):
* Implement `SyftScanner : IArtifactScanner`:
* Invoke external CLI with fixed flags.
* Use JSON output mode.
* Resolve CLI path from config.
* Ensure deterministic:
* Disable auto-updating DB.
* Use a local DB path versioned and optionally included into bundle.
* Write parsing code Syft → `ScanResult`.
* Add retry & clear error mapping (timeout, auth error, network error).
3. **/scan endpoint**
* Validate request.
* Call `IArtifactScanner.ScanAsync`.
* Build a `ScanEvidence` predicate:
```csharp
public sealed class ScanEvidencePredicate
{
public ArtifactRef Artifact { get; init; } = default!;
public Digest ArtifactDigest { get; init; } = default!;
public DateTimeOffset ScannedAt { get; init; }
public string ScannerName { get; init; } = default!;
public string ScannerVersion { get; init; } = default!;
public IReadOnlyList<DiscoveredPackage> Packages { get; init; } = Array.Empty<DiscoveredPackage>();
}
```
* Build intoto statement for predicate.
* Call `IAttestationSigner.SignAsync`, persist:
* Raw envelope to `attestations` table.
* Map to `EvidenceId` + `AttestationId`.
**Acceptance criteria**
* Given a fixed image and fixed scanner DB, repeated `/scan` calls produce identical:
* `ScanResult` (up to ordering).
* `ScanEvidence` payload.
* `InputsLock` proof hash (once implemented).
* E2E test: run scan on a small public image in CI using a pre-bundled scanner DB.
---
### 2.2 Sbomer (`StellaOps.Sbom` + `/sbom`)
**Goal:** Normalize `ScanResult` into a canonical SBOM (CycloneDX/SPDX) + emit SBOM attestation.
#### Models
Create neutral SBOM model (internal):
```csharp
public sealed class CanonicalComponent
{
public string Name { get; init; } = default!;
public string Version { get; init; } = default!;
public string Purl { get; init; } = default!;
public string? License { get; init; }
public Digest Digest { get; init; } = default!;
public string? SourceLocation { get; init; } // file path, layer info
}
public sealed class CanonicalSbom
{
public string SbomId { get; init; } = default!;
public ArtifactRef Artifact { get; init; } = default!;
public Digest ArtifactDigest { get; init; } = default!;
public IReadOnlyList<CanonicalComponent> Components { get; init; } = Array.Empty<CanonicalComponent>();
public DateTimeOffset CreatedAt { get; init; }
public string Format { get; init; } = "CycloneDX-JSON-1.5"; // default
}
```
#### Sbomer service
```csharp
public interface ISbomer
{
CanonicalSbom FromScan(ScanResult scan);
string ToCycloneDxJson(CanonicalSbom sbom);
string ToSpdxJson(CanonicalSbom sbom);
}
```
Implementation details:
* Map OS/deps to PURLs (use existing PURL libs or implement minimal helpers).
* Stable ordering:
* Sort components by `Purl` then `Version` before serialization.
* Hash the SBOM JSON → `Digest` (e.g., `Digest("sha256", "...")`).
#### SBOM attestation & `/sbom` endpoint
* For an `ArtifactRef` (or `ScanEvidence` EvidenceId):
1. Fetch latest `ScanResult` from DB.
2. Call `ISbomer.FromScan`.
3. Serialize to CycloneDX.
4. Emit `SbomProduced` predicate & DSSE envelope.
5. Persist SBOM JSON blob & link to artifact.
**Acceptance criteria**
* Same `ScanResult` always produces bit-identical SBOM JSON.
* Unit tests verifying:
* PURL mapping correctness.
* Stable ordering.
* `/sbom` endpoint can:
* Build SBOM from scan.
* Return existing SBOM if already generated (idempotence).
---
## 3. Attestation Authority & trust log (Phase 3)
**Goal:** Verify all attestations, store them with a trust log, and produce `PolicyVerified` attestations.
### 3.1 Authority service (`StellaOps.Attest` + `StellaOps.WebApi`)
Key interfaces:
```csharp
public interface IAuthority
{
Task<AttestationId> RecordAsync(DsseEnvelope envelope, CancellationToken ct);
Task<Proof> VerifyChainAsync(ArtifactRef artifact, CancellationToken ct);
}
```
Implementation steps:
1. **Attestations store**
* Table `attestations`:
* `id` (AttestationId, PK)
* `artifact_kind` / `artifact_value`
* `predicate_type` (enum)
* `payload_type`
* `payload_hash`
* `envelope_json`
* `created_at`
* `signer_keyid`
* Table `trust_log`:
* `id`
* `attestation_id`
* `status` (verified / failed / pending)
* `reason`
* `verified_at`
* `verification_data_json` (cert chain, Rekor log index, etc.)
2. **Verification pipeline**
* Implement `IAttestationVerifier.VerifyAsync`:
* Check envelope integrity (no duplicate signatures, required fields).
* Verify crypto signature (keys from configuration store or Sigstore if you integrate later).
* `IAuthority.RecordAsync`:
* Verify envelope.
* Save to `attestations`.
* Add entry to `trust_log`.
* `VerifyChainAsync`:
* For a given `ArtifactRef`:
* Load all attestations for that artifact.
* Ensure each is `status=verified`.
* Compute `InputsLock` = hash of:
* Sorted predicate payloads.
* Feeds manifest.
* Policy rules.
* Return `Proof`.
### 3.2 `/attest` API
* **POST /attest**: submit DSSE envelope (for external tools).
* **GET /attest?artifact=`...`**: list attestations + trust status.
* **GET /attest/{id}/proof**: return verification proof (including InputsLock).
**Acceptance criteria**
* Invalid signatures rejected.
* Tampering test: alter a byte in envelope JSON → verification fails.
* `VerifyChainAsync` returns same `Proof.InputsLock` for identical sets of inputs.
---
## 4. Graph Store & Policy engine (Phase 4)
**Goal:** Store SBOM, vulnerabilities, reachability, VEX, and query them to make deterministic VEX-gate decisions.
### 4.1 Graph model (`StellaOps.Graph`)
Tables (simplified):
* `artifacts`:
* `id` (PK), `kind`, `value`, `digest_algorithm`, `digest_value`
* `components`:
* `id`, `purl`, `name`, `version`, `license`, `digest_algorithm`, `digest_value`
* `vulnerabilities`:
* `id`, `cve_id`, `severity`, `source` (NVD/OSV/vendor), `data_json`
* `vex_statements`:
* `id`, `cve_id`, `component_purl`, `status` (`not_affected`, `affected`, etc.), `source`, `data_json`
* `edges`:
* `id`, `from_kind`, `from_id`, `to_kind`, `to_id`, `relation` (enum), `evidence_id`, `data_json`
Example `relation` values:
* `artifact_contains_component`
* `component_vulnerable_to`
* `component_reachable_via`
* `vulnerability_overridden_by_vex`
* `artifact_scanned_by`
* `decision_verified_by`
Graph access abstraction:
```csharp
public interface IGraphRepository
{
Task UpsertSbomAsync(CanonicalSbom sbom, EvidenceId evidenceId, CancellationToken ct);
Task ApplyVulnerabilityFactsAsync(IEnumerable<VulnerabilityFact> facts, CancellationToken ct);
Task ApplyReachabilityFactsAsync(IEnumerable<ReachabilityFact> facts, CancellationToken ct);
Task ApplyVexStatementsAsync(IEnumerable<VexStatement> vexStatements, CancellationToken ct);
Task<ArtifactGraphSnapshot> GetSnapshotAsync(ArtifactRef artifact, CancellationToken ct);
}
```
`ArtifactGraphSnapshot` is an in-memory projection used by the policy engine.
### 4.2 Policy engine (`StellaOps.Policy`)
Policy lattice (minimal version):
```csharp
public enum RiskState
{
Clean,
VulnerableNotReachable,
VulnerableReachable,
Unknown
}
public sealed class PolicyEvaluationContext
{
public ArtifactRef Artifact { get; init; } = default!;
public ArtifactGraphSnapshot Snapshot { get; init; } = default!;
public IReadOnlyDictionary<string,string>? Environment { get; init; }
}
public interface IPolicyEngine
{
GateDecision Evaluate(PolicyEvaluationContext context);
}
```
Default policy logic:
1. For each vulnerability affecting a component in the artifact:
* Check for VEX:
* If trusted VEX says `not_affected` → ignore.
* Check reachability:
* If proven reachable → mark as `VulnerableReachable`.
* If proven not reachable → `VulnerableNotReachable`.
* If unknown → `Unknown`.
2. Aggregate:
* If any `Critical/High` in `VulnerableReachable``Block`.
* Else if any `Critical/High` in `Unknown``Warn` and log Unknowns.
* Else → `Allow`.
### 4.3 `/vex-gate` endpoint
Implementation:
* Resolve `ArtifactRef`.
* Build `ArtifactGraphSnapshot` using `IGraphRepository.GetSnapshotAsync`.
* Call `IPolicyEngine.Evaluate`.
* Request `IAuthority.VerifyChainAsync``Proof`.
* Emit `PolicyVerified` attestation for this decision.
* Return `GateDecision` + `Proof`.
**Acceptance criteria**
* Given a fixture DB snapshot, calling `/vex-gate` twice yields identical decisions & proof IDs.
* Policy behavior matches the rule text:
* Regression test that modifies severity or reachability → correct decision changes.
---
## 5. Diffs & Unknowns workflow (Phase 5)
### 5.1 Diff engine (`/diff`)
Contracts:
```csharp
public sealed class DiffRequest
{
public string Kind { get; init; } = default!; // "sbom-sbom" | "sbom-runtime"
public string LeftId { get; init; } = default!;
public string RightId { get; init; } = default!;
}
public sealed class DiffComponentChange
{
public string Purl { get; init; } = default!;
public string ChangeType { get; init; } = default!; // "added" | "removed" | "changed"
public string? OldVersion { get; init; }
public string? NewVersion { get; init; }
}
public sealed class DiffResponse
{
public IReadOnlyList<DiffComponentChange> Components { get; init; } = Array.Empty<DiffComponentChange>();
}
```
Implementation:
* SBOM↔SBOM: compare `CanonicalSbom.Components` by PURL (+ version).
* SBOM↔runtime:
* Input runtime snapshot (`process maps`, `loaded libs`, etc.) from agents.
* Map runtime libs to PURLs.
* Determine reachable components from runtime usage → `ReachabilityFact`s into graph.
### 5.2 Unknowns module (`/unknowns`)
Data model:
```csharp
public enum UnknownState
{
New,
Triage,
VendorQuery,
Verified,
Closed
}
public sealed class Unknown
{
public Guid Id { get; init; }
public ArtifactRef Artifact { get; init; } = default!;
public string Type { get; init; } = default!; // "vuln-mapping", "reachability", "vex-trust"
public string Subject { get; init; } = default!; // e.g., "CVE-2024-XXXX / purl:pkg:..."
public UnknownState State { get; set; }
public DateTimeOffset CreatedAt { get; init; }
public DateTimeOffset? SlaDeadline { get; set; }
public string? Owner { get; set; }
public string EvidenceJson { get; init; } = default!; // serialized proof / edges
public string? ResolutionNotes { get; set; }
}
```
API:
* `GET /unknowns`: filter by state, artifact, owner.
* `POST /unknowns`: create manual unknown.
* `PATCH /unknowns/{id}`: update state, owner, notes.
Integration:
* Policy engine:
* For any `Unknown` risk state, auto-create Unknown with SLA if not already present.
* When Unknown resolves (e.g., vendor VEX added), re-run policy evaluation for affected artifact(s).
**Acceptance criteria**
* When `VulnerableReachability` is `Unknown`, `/vex-gate` both:
* Returns `Warn`.
* Creates an Unknown row.
* Transitioning Unknown to `Verified` triggers re-evaluation (integration test).
---
## 6. Offline / airgapped bundles (Phase 6)
**Goal:** Everything works on a single machine with no network.
### 6.1 Bundle format & IO (`StellaOps.Cli` + `StellaOps.WebApi`)
Directory structure inside ZIP:
```text
/bundle/
feeds/
manifest.json // hashes, timestamps for NVD, OSV, vendor feeds
nvd.json
osv.json
vendor-*.json
sboms/
{artifactDigest}.json
attestations/
*.jsonl // one DSSE envelope per line
proofs/
rekor/
merkle.json
policy/
lattice.json // serialized rules / thresholds
replay/
inputs.lock // hash & metadata of all of the above
```
Implement:
```csharp
public interface IBundleReader
{
Task<Bundle> ReadAsync(string path, CancellationToken ct);
}
public interface IBundleWriter
{
Task WriteAsync(Bundle bundle, string path, CancellationToken ct);
}
```
`Bundle` holds strongly-typed representations of the manifest, SBOMs, attestations, proofs, etc.
### 6.2 CLI commands
* `stella scan --image registry/app:1.2.3 --out bundle.zip`
* Runs scan + sbom locally.
* Writes bundle with:
* SBOM.
* Scan + Sbom attestations.
* Feeds manifest.
* `stella vex-gate --bundle bundle.zip`
* Loads bundle.
* Runs policy engine locally.
* Prints `Allow/Warn/Block` + proof summary.
**Acceptance criteria**
* Given the same `bundle.zip`, `stella vex-gate` on different machines produces identical decisions and proof hashes.
* `/vex-gate?bundle=/path/to/bundle.zip` in API uses same BundleReader and yields same output as CLI.
---
## 7. Testing & quality plan
### 7.1 Unit tests
* Domain & Contracts:
* Serialization roundtrip for all DTOs.
* Attest:
* DSSE encode/decode.
* Signature verification with test key pair.
* Sbom:
* Known `ScanResult` → expected SBOM JSON snapshot.
* Policy:
* Table-driven tests:
* Cases: {severity, reachable, hasVex} → {Allow/Warn/Block}.
### 7.2 Integration tests
* Scanner:
* Use a tiny test image with known components.
* Graph + Policy:
* Seed DB with:
* 1 artifact, 2 components, 1 vuln, 1 VEX, 1 reachability fact.
* Assert that `/vex-gate` returns expected decision.
### 7.3 E2E scenario
Single test flow:
1. `POST /scan` → EvidenceId.
2. `POST /sbom` → SBOM + SbomProduced attestation.
3. Load dummy vulnerability feed → `ApplyVulnerabilityFactsAsync`.
4. `POST /vex-gate` → Block (no VEX).
5. Add VEX statement → `ApplyVexStatementsAsync`.
6. `POST /vex-gate` → Allow.
Assertions:
* All decisions contain `Proof` with non-empty `InputsLock`.
* `InputsLock` is identical between runs with unchanged inputs.
---
## 8. Concrete backlog (you can paste into Jira)
### Epic 1 Foundations
* Task: Create solution & project skeleton.
* Task: Implement core domain types (`Digest`, `ArtifactRef`, `EvidenceId`, `Proof`).
* Task: Implement DSSE envelope + JSON serialization.
* Task: Implement basic `IAttestationSigner` with local key pair.
* Task: Define `GateDecision` & `VexGateRequest` contracts.
### Epic 2 Scanner & Sbomer
* Task: Implement `IArtifactScanner` + `SyftScanner`.
* Task: Implement `/scan` endpoint + attestation.
* Task: Implement `ISbomer` & canonical SBOM model.
* Task: Implement `/sbom` endpoint + SbomProduced attestation.
* Task: Snapshot tests for SBOM determinism.
### Epic 3 Authority & Trust log
* Task: Design `attestations` & `trust_log` tables (EF Core migrations).
* Task: Implement `IAuthority.RecordAsync` + `VerifyChainAsync`.
* Task: Implement `/attest` endpoints.
* Task: Add proof generation (`InputsLock` hashing).
### Epic 4 Graph & Policy
* Task: Create graph schema (`artifacts`, `components`, `vulnerabilities`, `edges`, `vex_statements`).
* Task: Implement `IGraphRepository.UpsertSbomAsync`.
* Task: Ingest vulnerability feed (NVD/OSV) into graph facts.
* Task: Implement minimal `IPolicyEngine` with rules.
* Task: Implement `/vex-gate` endpoint.
### Epic 5 Diff & Unknowns
* Task: Implement SBOM↔SBOM diff logic + `/diff`.
* Task: Create `unknowns` table + API.
* Task: Wire policy engine to auto-create Unknowns.
* Task: Add re-evaluation when Unknown state changes.
### Epic 6 Offline bundles & CLI
* Task: Implement `BundleReader` / `BundleWriter`.
* Task: Implement `stella scan` and `stella vex-gate`.
* Task: Add `?bundle=` parameter support in APIs.
---
If youd like, I can next:
* Turn this into actual C# interface files (ready to drop into your repo), or
* Produce a JSON OpenAPI sketch for `/scan`, `/sbom`, `/attest`, `/vex-gate`, `/diff`, `/unknowns`.

View File

@@ -0,0 +1,747 @@
Heres a compact, practical way to add an **explanation graph** that traces every vulnerability verdict back to raw evidence—so auditors can verify results without trusting an LLM.
---
# What it is (in one line)
A small, immutable graph that connects a **verdict** → to **reasoning steps** → to **raw evidence** (source scan records, binary symbol/buildID matches, external advisories/feeds), with cryptographic hashes so anyone can replay/verify it.
---
# Minimal data model (vendorneutral)
```json
{
"explanationGraph": {
"scanId": "uuid",
"artifact": {
"purl": "pkg:docker/redis@7.2.4",
"digest": "sha256:…",
"buildId": "elf:abcd…|pe:…|macho:…"
},
"verdicts": [
{
"verdictId": "uuid",
"cve": "CVE-2024-XXXX",
"status": "affected|not_affected|under_investigation",
"policy": "vex/lattice:v1",
"reasoning": [
{"stepId":"s1","type":"callgraph.reachable","evidenceRef":"e1"},
{"stepId":"s2","type":"version.match","evidenceRef":"e2"},
{"stepId":"s3","type":"vendor.vex.override","evidenceRef":"e3"}
],
"provenance": {
"scanner": "StellaOps.Scanner@1.3.0",
"rulesHash": "sha256:…",
"time": "2025-11-25T12:34:56Z",
"attestation": "dsse:…"
}
}
],
"evidence": [
{
"evidenceId":"e1",
"kind":"binary.callgraph",
"hash":"sha256:…",
"summary":"main -> libssl!EVP_* path present",
"blobPointer":"ipfs://… | file://… | s3://…"
},
{
"evidenceId":"e2",
"kind":"source.scan",
"hash":"sha256:…",
"summary":"Detected libssl 3.0.14 via SONAME + buildid",
"blobPointer":"…"
},
{
"evidenceId":"e3",
"kind":"external.feed",
"hash":"sha256:…",
"summary":"Vendor VEX: CVE not reachable when FIPS mode enabled",
"blobPointer":"…",
"externalRef":{"type":"advisory","id":"VEX-ACME-2025-001","url":"…"}
}
]
}
}
```
---
# How it works (flow)
* **Collect** raw artifacts: scanner findings, binary symbol matches (BuildID / PDB / dSYM), SBOM components, external feeds (NVD, vendor VEX).
* **Normalize** to evidence nodes (immutable blobs with content hash + pointer).
* **Reason** via small, deterministic rules (your lattice/policy). Each rule emits a *reasoning step* that points to evidence.
* **Emit a verdict** with status + full chain of steps.
* **Seal** with DSSE/Sigstore (or your offline signer) so the whole graph is replayable.
---
# Why this helps (auditable AI)
* **No black box**: every “affected/not affected” claim links to verifiable bytes.
* **Deterministic**: same inputs + rules = same verdict (hashes prove it).
* **Reproducible for clients/regulators**: export graph + blobs, they replay locally.
* **LLMoptional**: you can add LLM explanations as *nonauthoritative* annotations; the verdict remains policydriven.
---
# C# dropin (StellaOps style)
```csharp
public record EvidenceNode(
string EvidenceId, string Kind, string Hash, string Summary, string BlobPointer,
ExternalRef? ExternalRef = null);
public record ReasoningStep(string StepId, string Type, string EvidenceRef);
public record Verdict(
string VerdictId, string Cve, string Status, string Policy,
IReadOnlyList<ReasoningStep> Reasoning, Provenance Provenance);
public record Provenance(string Scanner, string RulesHash, DateTimeOffset Time, string Attestation);
public record ExplanationGraph(
Guid ScanId, Artifact Artifact,
IReadOnlyList<Verdict> Verdicts, IReadOnlyList<EvidenceNode> Evidence);
public record Artifact(string Purl, string Digest, string BuildId);
```
* Persist as immutable documents (Mongo collection `explanations`).
* Store large evidence blobs in object storage; keep `hash` + `blobPointer` in Mongo.
* Sign the serialized graph (DSSE) and store the signature alongside.
---
# UI (compact “trace” panel)
* **Top line:** CVE → Status chip (Affected / Not affected / Needs review).
* **Three tabs:** *Evidence*, *Reasoning*, *Provenance*.
* **Oneclick export:** “Download Replay Bundle (.zip)” → JSON graph + evidence blobs + verify script.
* **Badge:** “Deterministic ✓” when rulesHash + inputs resolve to prior signature.
---
# Ops & replay
* Bundle a tiny CLI: `stellaops-explain verify graph.json --evidence ./blobs/`.
* Verification checks: all hashes match, DSSE signature valid, rulesHash known, verdict derivable from steps.
---
# Where to start (1week sprint)
* Day 12: Model + Mongo collections + signer service.
* Day 3: Scanner adapters emit `EvidenceNode` records; policy engine emits `ReasoningStep`.
* Day 4: Verdict assembly + DSSE signing + export bundle.
* Day 5: Minimal UI trace panel + CLI verifier.
If you want, I can generate the Mongo schemas, a DSSE signing helper, and the React/Angular trace panel stub next.
Heres a concrete implementation plan you can hand to your developers so theyre not guessing what to build.
Ill break it down by **phases**, and inside each phase Ill call out **owner**, **deliverables**, and **acceptance criteria**.
---
## Phase 0 Scope & decisions (½ day)
**Goal:** Lock in the “rules of the game” so nobody bikesheds later.
**Decisions to confirm (write in a short ADR):**
1. **Canonical representation & hashing**
* Format for hashing: **canonical JSON** (stable property ordering, UTF8, no whitespace).
* Algorithm: **SHA256** for:
* `ExplanationGraph` document
* each `EvidenceNode`
* Hash scope:
* `evidence.hash` = hash of the raw evidence blob (or canonical subset if huge)
* `graphHash` = hash of the entire explanation graph document (minus signature).
2. **Signing**
* Format: **DSSE envelope** (`payloadType = "stellaops/explanation-graph@v1"`).
* Key management: use existing **offline signing key** or Sigstorestyle keyless if already in org.
* Signature attached as:
* `provenance.attestation` field inside each verdict **and**
* stored in a separate `explanation_signatures` collection or S3 path for replay.
3. **Storage**
* Metadata: **MongoDB** collection `explanation_graphs`.
* Evidence blobs:
* S3 (or compatible) bucket `stella-explanations/` with layout:
* `evidence/{evidenceId}` or `evidence/{hash}`.
4. **ID formats**
* `scanId`: UUID (string).
* `verdictId`, `evidenceId`, `stepId`: UUID (string).
* `buildId`: reuse existing convention (`elf:<buildid>`, `pe:<guid>`, `macho:<uuid>`).
**Deliverable:** 12 page ADR in repo (`/docs/adr/000-explanation-graph.md`).
---
## Phase 1 Domain model & persistence (backend)
**Owner:** Backend
### 1.1. Define core C# domain models
Place in `StellaOps.Explanations` project or equivalent:
```csharp
public record ArtifactRef(
string Purl,
string Digest,
string BuildId);
public record ExternalRef(
string Type, // "advisory", "vex", "nvd", etc.
string Id,
string Url);
public record EvidenceNode(
string EvidenceId,
string Kind, // "binary.callgraph", "source.scan", "external.feed", ...
string Hash, // sha256 of blob
string Summary,
string BlobPointer, // s3://..., file://..., ipfs://...
ExternalRef? ExternalRef = null);
public record ReasoningStep(
string StepId,
string Type, // "callgraph.reachable", "version.match", ...
string EvidenceRef); // EvidenceId
public record Provenance(
string Scanner,
string RulesHash, // hash of rules/policy bundle used
DateTimeOffset Time,
string Attestation); // DSSE envelope (base64 or JSON)
public record Verdict(
string VerdictId,
string Cve,
string Status, // "affected", "not_affected", "under_investigation"
string Policy, // e.g. "vex.lattice:v1"
IReadOnlyList<ReasoningStep> Reasoning,
Provenance Provenance);
public record ExplanationGraph(
Guid ScanId,
ArtifactRef Artifact,
IReadOnlyList<Verdict> Verdicts,
IReadOnlyList<EvidenceNode> Evidence,
string GraphHash); // sha256 of canonical JSON
```
### 1.2. MongoDB schema
Collection: `explanation_graphs`
Document shape:
```jsonc
{
"_id": "scanId:artifactDigest", // composite key or just ObjectId + separate fields
"scanId": "uuid",
"artifact": {
"purl": "pkg:docker/redis@7.2.4",
"digest": "sha256:...",
"buildId": "elf:abcd..."
},
"verdicts": [ /* Verdict[] */ ],
"evidence": [ /* EvidenceNode[] */ ],
"graphHash": "sha256:..."
}
```
**Indexes:**
* `{ scanId: 1 }`
* `{ "artifact.digest": 1 }`
* `{ "verdicts.cve": 1, "artifact.digest": 1 }` (compound)
* Optional: TTL or archiving mechanism if you dont want to keep these forever.
**Acceptance criteria:**
* You can serialize/deserialize `ExplanationGraph` to Mongo without loss.
* Indexes exist and queries by `scanId`, `artifact.digest`, and `(digest + CVE)` are efficient.
---
## Phase 2 Evidence ingestion plumbing
**Goal:** Make every relevant raw fact show up as an `EvidenceNode`.
**Owner:** Backend scanner team
### 2.1. Evidence factory service
Create `IEvidenceService`:
```csharp
public interface IEvidenceService
{
Task<EvidenceNode> StoreBinaryCallgraphAsync(
Guid scanId,
ArtifactRef artifact,
byte[] callgraphBytes,
string summary,
ExternalRef? externalRef = null);
Task<EvidenceNode> StoreSourceScanAsync(
Guid scanId,
ArtifactRef artifact,
byte[] scanResultJson,
string summary);
Task<EvidenceNode> StoreExternalFeedAsync(
Guid scanId,
ExternalRef externalRef,
byte[] rawPayload,
string summary);
}
```
Implementation tasks:
1. **Hash computation**
* Compute SHA256 over raw bytes.
* Prefer a helper:
```csharp
public static string Sha256Hex(ReadOnlySpan<byte> data) { ... }
```
2. **Blob storage**
* S3 key format, e.g.: `explanations/{scanId}/{evidenceId}`.
* `BlobPointer` string = `s3://stella-explanations/explanations/{scanId}/{evidenceId}`.
3. **EvidenceNode creation**
* Generate `evidenceId = Guid.NewGuid().ToString("N")`.
* Populate `kind`, `hash`, `summary`, `blobPointer`, `externalRef`.
4. **Graph assembly contract**
* Evidence service **does not** write to Mongo.
* It only uploads blobs and returns `EvidenceNode` objects.
* The **ExplanationGraphBuilder** (next phase) collects them.
**Acceptance criteria:**
* Given a callgraph binary, a corresponding `EvidenceNode` is returned with:
* hash matching the blob (verified in tests),
* blob present in S3,
* summary populated.
---
## Phase 3 Reasoning & policy integration
**Goal:** Instrument your existing VEX / lattice policy engine to emit deterministic **reasoning steps** instead of just a boolean status.
**Owner:** Policy / rules engine team
### 3.1. Expose rule evaluation trace
Assume you already have something like:
```csharp
VulnerabilityStatus Evaluate(ArtifactRef artifact, string cve, Findings findings);
```
Extend it to:
```csharp
public sealed class RuleEvaluationTrace
{
public string StepType { get; init; } // e.g. "version.match"
public string RuleId { get; init; } // "rule:openssl:versionFromElf"
public string Description { get; init; } // human-readable explanation
public string EvidenceKind { get; init; } // to match with EvidenceService
public object EvidencePayload { get; init; } // callgraph bytes, json, etc.
}
public sealed class EvaluationResult
{
public string Status { get; init; } // "affected", etc.
public IReadOnlyList<RuleEvaluationTrace> Trace { get; init; }
}
```
New API:
```csharp
EvaluationResult EvaluateWithTrace(
ArtifactRef artifact, string cve, Findings findings);
```
### 3.2. From trace to ReasoningStep + EvidenceNode
Create `ExplanationGraphBuilder`:
```csharp
public interface IExplanationGraphBuilder
{
Task<ExplanationGraph> BuildAsync(
Guid scanId,
ArtifactRef artifact,
IReadOnlyList<CveFinding> cveFindings,
string scannerName);
}
```
Internal algorithm for each `CveFinding`:
1. Call `EvaluateWithTrace(artifact, cve, finding)` to get `EvaluationResult`.
2. For each `RuleEvaluationTrace`:
* Use `EvidenceService` with appropriate method based on `EvidenceKind`.
* Get back an `EvidenceNode` with `evidenceId`.
* Create `ReasoningStep`:
* `StepId = Guid.NewGuid()`
* `Type = trace.StepType`
* `EvidenceRef = evidenceNode.EvidenceId`
3. Assemble `Verdict`:
```csharp
var verdict = new Verdict(
verdictId: Guid.NewGuid().ToString("N"),
cve: finding.Cve,
status: result.Status,
policy: "vex.lattice:v1",
reasoning: steps,
provenance: new Provenance(
scanner: scannerName,
rulesHash: rulesBundleHash,
time: DateTimeOffset.UtcNow,
attestation: "" // set in Phase 4
)
);
```
4. Collect:
* all `EvidenceNode`s (dedupe by `hash` to avoid duplicates).
* all `Verdict`s.
**Acceptance criteria:**
* Given deterministic inputs (scan + rules bundle hash), repeated runs produce:
* same sequence of `ReasoningStep` types,
* same set of `EvidenceNode.hash` values,
* same `status`.
---
## Phase 4 Graph hashing & DSSE signing
**Owner:** Security / platform
### 4.1. Canonical JSON for hash
Implement:
```csharp
public static class ExplanationGraphSerializer
{
public static string ToCanonicalJson(ExplanationGraph graph)
{
// no graphHash, no attestation in this step
}
}
```
Key requirements:
* Consistent property ordering (e.g. alphabetical).
* No extra whitespace.
* UTF8 encoding.
* Primitive formatting options fixed (e.g. date as ISO 8601 with `Z`).
### 4.2. Hash and sign
Before persisting:
```csharp
var graphWithoutHash = graph with { GraphHash = "" };
var canonicalJson = ExplanationGraphSerializer.ToCanonicalJson(graphWithoutHash);
var graphHash = Sha256Hex(Encoding.UTF8.GetBytes(canonicalJson));
// sign DSSE envelope
var envelope = dsseSigner.Sign(
payloadType: "stellaops/explanation-graph@v1",
payload: Encoding.UTF8.GetBytes(canonicalJson)
);
// attach
var signedVerdicts = graph.Verdicts
.Select(v => v with
{
Provenance = v.Provenance with { Attestation = envelope.ToJson() }
})
.ToList();
var finalGraph = graph with
{
GraphHash = $"sha256:{graphHash}",
Verdicts = signedVerdicts
};
```
Then write `finalGraph` to Mongo.
**Acceptance criteria:**
* Recomputing `graphHash` from Mongo document (zeroing `graphHash` and `attestation`) matches stored value.
* Verifying DSSE signature with the public key succeeds.
---
## Phase 5 Backend APIs & export bundle
**Owner:** Backend / API
### 5.1. Read APIs
Add endpoints (REST-ish):
1. **Get graph for scan-artifact**
`GET /explanations/scans/{scanId}/artifacts/{digest}`
* Returns entire `ExplanationGraph` JSON.
2. **Get single verdict**
`GET /explanations/scans/{scanId}/artifacts/{digest}/cves/{cve}`
* Returns `Verdict` + its subset of `EvidenceNode`s.
3. **Search by CVE**
`GET /explanations/search?cve=CVE-2024-XXXX&digest=sha256:...`
* Returns list of `(scanId, artifact, verdictId)`.
### 5.2. Export replay bundle
`POST /explanations/{scanId}/{digest}/export`
Implementation:
* Create a temporary directory.
* Write:
* `graph.json` → `ExplanationGraph` as stored.
* `signature.json` → DSSE envelope alone (optional).
* Evidence blobs:
* For each `EvidenceNode`:
* Download from S3 and store as `evidence/{evidenceId}`.
* Zip the folder: `explanation-{scanId}-{shortDigest}.zip`.
* Stream as download.
### 5.3. CLI verifier
Small .NET / Go CLI:
Commands:
```bash
stellaops-explain verify graph.json --evidence ./evidence
```
Verification steps:
1. Load `graph.json`, parse to `ExplanationGraph`.
2. Strip `graphHash` & `attestation`, reserialize canonical JSON.
3. Recompute SHA256 and compare to `graphHash`.
4. Verify DSSE envelope with public key.
5. For each `EvidenceNode`:
* Read file `./evidence/{evidenceId}`.
* Recompute hash and compare with `evidence.hash`.
Exit with nonzero code if anything fails; print a short summary.
**Acceptance criteria:**
* Export bundle roundtrips: `verify` passes on an exported zip.
* APIs documented in OpenAPI / Swagger.
---
## Phase 6 UI: Explanation trace panel
**Owner:** Frontend
### 6.1. API integration
New calls in frontend client:
* `GET /explanations/scans/{scanId}/artifacts/{digest}`
* Optionally `GET /explanations/.../cves/{cve}` if you want lazy loading per CVE.
### 6.2. Component UX
On the “vulnerability detail” view:
* Add **“Explanation”** tab with three sections:
1. **Verdict summary**
* Badge: `Affected` / `Not affected` / `Under investigation`.
* Text: `Derived using policy {policy}, rules hash {rulesHash[..8]}.`
2. **Reasoning timeline**
* Vertical list of `ReasoningStep`s:
* Icon per type (e.g. “flow” icon for `callgraph.reachable`).
* Title = `Type` (humanized).
* Click to expand underlying `EvidenceNode.summary`.
* Optional “View raw evidence” link (downloads blob via S3 signed URL).
3. **Provenance**
* Show:
* `scanner`
* `rulesHash`
* `time`
* “Attested ✓” if DSSE verifies on the backend (or precomputed).
4. **Export**
* Button: “Download replay bundle (.zip)”
* Calls export endpoint and triggers browser download.
**Acceptance criteria:**
* For any CVE in UI, a user can:
* See why it is (not) affected in at most 2 clicks.
* Download a replay bundle via the UI.
---
## Phase 7 Testing strategy
**Owner:** QA + all devs
### 7.1. Unit tests
* EvidenceService:
* Hash matches blob contents.
* BlobPointer formats are as expected.
* ExplanationGraphBuilder:
* Given fixed test input, the resulting graph JSON matches golden file.
* Serializer:
* Canonical JSON is stable under property reordering in the code.
### 7.2. Integration tests
* Endtoend fake scan:
* Simulate scanner output + rules.
* Build graph → persist → fetch via API.
* Run CLI verify on exported bundle in CI.
### 7.3. Security tests
* Signature tampering:
* Modify `graph.json` in exported bundle; `verify` must fail.
* Evidence tampering:
* Modify an evidence file; `verify` must fail.
---
## Phase 8 Rollout
**Owner:** PM / Tech lead
1. **Feature flag**
* Start with explanation graph generation behind a flag for:
* subset of scanners,
* subset of tenants.
2. **Backfill (optional)**
* If useful, run a oneoff job that:
* Takes recent scans,
* Rebuilds explanation graphs,
* Stores them in Mongo.
3. **Docs**
* Short doc page for customers:
* “What is an Explanation Graph?”
* “How to verify it with the CLI?”
---
## Developer checklist (TL;DR)
You can literally drop this into Jira as epics/tasks:
1. **Backend**
* [ ] Implement domain models (`ExplanationGraph`, `Verdict`, `EvidenceNode`, etc.).
* [ ] Implement `IEvidenceService` + S3 integration.
* [ ] Extend policy engine to `EvaluateWithTrace`.
* [ ] Implement `ExplanationGraphBuilder`.
* [ ] Implement canonical serializer, hashing, DSSE signing.
* [ ] Implement Mongo persistence + indexes.
* [ ] Implement REST APIs + export ZIP.
2. **Frontend**
* [ ] Wire new APIs into the vulnerability detail view.
* [ ] Build Explanation tab (Summary / Reasoning / Provenance).
* [ ] Implement “Download replay bundle” button.
3. **Tools**
* [ ] Implement `stellaops-explain verify` CLI.
* [ ] Add CI test that runs verify against a sample bundle.
4. **QA**
* [ ] Goldenfile tests for graphs.
* [ ] Signature & evidence tampering tests.
* [ ] UI functional tests on explanations.
---
If youd like, next step I can turn this into:
* concrete **OpenAPI spec** for the new endpoints, and/or
* a **sample `stellaops-explain verify` CLI skeleton** (C# or Go).

View File

@@ -0,0 +1,799 @@
Heres a quick win for making your vuln paths auditorfriendly without retraining any models: **add a plainlanguage `reason` to every graph edge** (why this edge exists). Think “introduced via dynamic import” or “symbol relocation via `ld`”, not jargon soup.
![A simple vulnerability path showing edges labeled with reasons like "imported at runtime" and "linked via ld".](https://images.unsplash.com/photo-1515879218367-8466d910aaa4?ixlib=rb-4.0.3\&q=80\&fm=jpg\&fit=crop\&w=1600\&h=900)
# Why this helps
* **Explains reachability** at a glance (auditors & devs can follow the story).
* **Reduces falsepositive fights** (every hop justifies itself).
* **Stable across languages** (no model changes, just metadata).
# Minimal schema change
Add three fields to every edge in your call/dep graph (SBOM→Reachability→Fix plan):
```json
{
"from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
"to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
"via": {
"reason": "imported via top-level module dependency",
"evidence": [
"import urllib3 in requests/adapters.py:12",
"pip freeze: urllib3==2.2.3"
],
"provenance": {
"detector": "StellaOps.Scanner.WebService@1.4.2",
"rule_id": "PY-IMPORT-001",
"confidence": "high"
}
}
}
```
### Standard reason glossary (use as enum)
* `declared_dependency` (manifest lock/SBOM edge)
* `static_call` (direct call site with symbol ref)
* `dynamic_import` (e.g., `__import__`, `importlib`, `require(...)`)
* `reflection_call` (C# `MethodInfo.Invoke`, Java reflection)
* `plugin_discovery` (entry points, ServiceLoader, MEF)
* `symbol_relocation` (ELF/PE/MachO relocation binds)
* `plt_got_resolution` (ELF PLT/GOT jump to symbol)
* `ld_preload_injection` (runtime injected .so/.dll)
* `env_config_path` (path read from env/config enables load)
* `taint_propagation` (user input reaches sink)
* `vendor_patch_alias` (function moved/aliased across versions)
# Emission rules (keep it deterministic)
* **One reason per edge**, short, lowercase snake_case from glossary.
* **Up to 3 evidence strings** (file:line or binary section + symbol).
* **Confidence**: `high|medium|low` with a single, stable rubric:
* high = exact symbol/call site or relocation
* medium = heuristic import/loader path
* low = inferred from naming or optional plugin
# UI/Report snippet
Render paths like:
```
app → requests → urllib3 → OpenSSL EVP_PKEY_new_raw_private_key
• declared_dependency (poetry.lock)
• static_call (requests.adapters:345)
• symbol_relocation (ELF .rela.plt: _EVP_PKEY_new_raw_private_key)
```
# C# dropin (for your .NET 10 code)
Edge builder with reason/evidence:
```csharp
public sealed record EdgeId(string From, string To);
public sealed record EdgeEvidence(
string Reason, // enum string from glossary
IReadOnlyList<string> Evidence, // file:line, symbol, section
string Confidence, // high|medium|low
string Detector, // component@version
string RuleId // stable rule key
);
public sealed record GraphEdge(EdgeId Id, EdgeEvidence Via);
public static class EdgeFactory
{
public static GraphEdge DeclaredDependency(string from, string to, string manifestPath)
=> new(new EdgeId(from, to),
new EdgeEvidence(
Reason: "declared_dependency",
Evidence: new[] { $"manifest:{manifestPath}" },
Confidence: "high",
Detector: "StellaOps.Scanner.WebService@1.0.0",
RuleId: "DEP-LOCK-001"));
public static GraphEdge SymbolRelocation(string from, string to, string objPath, string section, string symbol)
=> new(new EdgeId(from, to),
new EdgeEvidence(
Reason: "symbol_relocation",
Evidence: new[] { $"{objPath}::{section}:{symbol}" },
Confidence: "high",
Detector: "StellaOps.Scanner.WebService@1.0.0",
RuleId: "BIN-RELOC-101"));
}
```
# Integration checklist (fast path)
* Emit `via.reason/evidence/provenance` for **all** edges (SBOM, source, binary).
* Validate `reason` against glossary; reject freetext.
* Add a “**Why this edge exists**” column in your path tables.
* In JSON/CSV exports, keep columns: `from,to,reason,confidence,evidence0..2,rule_id`.
* In the console, collapse evidence by default; expand on click.
If you want, Ill plug this into your StellaOps graph contracts (Concelier/Cartographer) and produce the enum + validators and a tiny renderer for your docs.
Cool, lets turn this into a concrete, devfriendly implementation plan you can actually hand to teams.
Ill structure it by phases and by component (schema, producers, APIs, UI, testing, rollout) so you can slice into tickets easily.
---
## 0. Recap of what were building
**Goal:**
Every edge in your vuln path graph (SBOM → Reachability → Fix plan) carries **machinereadable, auditorfriendly metadata**:
```jsonc
{
"from": "pkg:pypi/requests@2.32.3#requests.sessions.Session.request",
"to": "pkg:pypi/urllib3@2.2.3#urllib3.connectionpool.HTTPConnectionPool.urlopen",
"via": {
"reason": "declared_dependency", // from a controlled enum
"evidence": [
"manifest:requirements.txt:3", // up to 3 short evidence strings
"pip freeze: urllib3==2.2.3"
],
"provenance": {
"detector": "StellaOps.Scanner.WebService@1.4.2",
"rule_id": "PY-IMPORT-001",
"confidence": "high"
}
}
}
```
Standard **reason glossary** (enum):
* `declared_dependency`
* `static_call`
* `dynamic_import`
* `reflection_call`
* `plugin_discovery`
* `symbol_relocation`
* `plt_got_resolution`
* `ld_preload_injection`
* `env_config_path`
* `taint_propagation`
* `vendor_patch_alias`
* `unknown` (fallback only when you truly cant do better)
---
## 1. Design & contracts (shared work for backend & frontend)
### 1.1 Define the canonical edge metadata types
**Owner:** Platform / shared lib team
**Tasks:**
1. In your shared C# library (used by scanners + API), define:
```csharp
public enum EdgeReason
{
Unknown = 0,
DeclaredDependency,
StaticCall,
DynamicImport,
ReflectionCall,
PluginDiscovery,
SymbolRelocation,
PltGotResolution,
LdPreloadInjection,
EnvConfigPath,
TaintPropagation,
VendorPatchAlias
}
public enum EdgeConfidence
{
Low = 0,
Medium,
High
}
public sealed record EdgeProvenance(
string Detector, // e.g., "StellaOps.Scanner.WebService@1.4.2"
string RuleId, // e.g., "PY-IMPORT-001"
EdgeConfidence Confidence
);
public sealed record EdgeVia(
EdgeReason Reason,
IReadOnlyList<string> Evidence,
EdgeProvenance Provenance
);
public sealed record EdgeId(string From, string To);
public sealed record GraphEdge(
EdgeId Id,
EdgeVia Via
);
```
2. Enforce **max 3 evidence strings** via a small helper to avoid accidental spam:
```csharp
public static class EdgeViaFactory
{
private const int MaxEvidence = 3;
public static EdgeVia Create(
EdgeReason reason,
IEnumerable<string> evidence,
string detector,
string ruleId,
EdgeConfidence confidence
)
{
var ev = evidence
.Where(s => !string.IsNullOrWhiteSpace(s))
.Take(MaxEvidence)
.ToArray();
return new EdgeVia(
Reason: reason,
Evidence: ev,
Provenance: new EdgeProvenance(detector, ruleId, confidence)
);
}
}
```
**Acceptance criteria:**
* [ ] EdgeReason enum defined and shared in a reusable package.
* [ ] EdgeVia and EdgeProvenance types exist and are serializable to JSON.
* [ ] Evidence is capped to 3 entries and cannot be null (empty list allowed).
---
### 1.2 API / JSON contract
**Owner:** API team
**Tasks:**
1. Extend your existing graph edge DTO to include `via`:
```csharp
public sealed record GraphEdgeDto
{
public string From { get; init; } = default!;
public string To { get; init; } = default!;
public EdgeViaDto Via { get; init; } = default!;
}
public sealed record EdgeViaDto
{
public string Reason { get; init; } = default!; // enum as string
public string[] Evidence { get; init; } = Array.Empty<string>();
public EdgeProvenanceDto Provenance { get; init; } = default!;
}
public sealed record EdgeProvenanceDto
{
public string Detector { get; init; } = default!;
public string RuleId { get; init; } = default!;
public string Confidence { get; init; } = default!; // "high|medium|low"
}
```
2. Ensure JSON is **additive** (backward compatible):
* `via` is **nonnullable** in responses from the new API version.
* If you must keep a legacy endpoint, add **v2** endpoints that guarantee `via`.
3. Update OpenAPI spec:
* Document `via.reason` as enum string, including allowed values.
* Document `via.provenance.detector`, `rule_id`, `confidence`.
**Acceptance criteria:**
* [ ] OpenAPI / Swagger shows `via.reason` as a string enum + description.
* [ ] New clients can deserialize edges with `via` without custom hacks.
* [ ] Old clients remain unaffected (either keep old endpoint or allow them to ignore `via`).
---
## 2. Producers: add reasons & evidence where edges are created
You likely have 3 main edge producers:
* SBOM / manifest / lockfile analyzers
* Source analyzers (call graph, taint analysis)
* Binary analyzers (ELF/PE/MachO, containers)
Treat each as a miniproject with identical patterns.
---
### 2.1 SBOM / manifest edges
**Owner:** SBOM / dep graph team
**Tasks:**
1. Identify all code paths that create “declared dependency” edges:
* Manifest → Package
* Root module → Imported package (if you store these explicitly)
2. Replace plain edge construction with factory calls:
```csharp
public static class EdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Sbom@1.0.0";
public static GraphEdge DeclaredDependency(
string from,
string to,
string manifestPath,
string? dependencySpecLine
)
{
var evidence = new List<string>
{
$"manifest:{manifestPath}"
};
if (!string.IsNullOrWhiteSpace(dependencySpecLine))
evidence.Add($"spec:{dependencySpecLine}");
var via = EdgeViaFactory.Create(
EdgeReason.DeclaredDependency,
evidence,
DetectorName,
"DEP-LOCK-001",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(from, to), via);
}
}
```
3. Make sure each SBOM/manifest edge sets:
* `reason = declared_dependency`
* `confidence = high`
* Evidence includes at least `manifest:<path>` and, if possible, line or spec snippet.
**Acceptance criteria:**
* [ ] Any SBOMgenerated edge returns with `via.reason == declared_dependency`.
* [ ] Evidence contains manifest path for ≥ 99% of SBOM edges.
* [ ] Unit tests cover at least: normal manifest, multiple manifests, malformed manifest.
---
### 2.2 Source code call graph edges
**Owner:** Static analysis / call graph team
**Tasks:**
1. Map current edge types → reasons:
* Direct function/method calls → `static_call`
* Reflection (Java/C#) → `reflection_call`
* Dynamic imports (`__import__`, `importlib`, `require(...)`) → `dynamic_import`
* Plugin systems (entry points, ServiceLoader, MEF) → `plugin_discovery`
* Taint / dataflow edges (user input → sink) → `taint_propagation`
2. Implement helper factories:
```csharp
public static class SourceEdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Source@1.0.0";
public static GraphEdge StaticCall(
string fromSymbol,
string toSymbol,
string filePath,
int lineNumber
)
{
var evidence = new[]
{
$"callsite:{filePath}:{lineNumber}"
};
var via = EdgeViaFactory.Create(
EdgeReason.StaticCall,
evidence,
DetectorName,
"SRC-CALL-001",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
public static GraphEdge DynamicImport(
string fromSymbol,
string toSymbol,
string filePath,
int lineNumber
)
{
var via = EdgeViaFactory.Create(
EdgeReason.DynamicImport,
new[] { $"importsite:{filePath}:{lineNumber}" },
DetectorName,
"SRC-DYNIMPORT-001",
EdgeConfidence.Medium
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
// Similar for ReflectionCall, PluginDiscovery, TaintPropagation...
}
```
3. Replace all direct `new GraphEdge(...)` calls in source analyzers with these factories.
**Acceptance criteria:**
* [ ] Direct call edges produce `reason = static_call` with file:line evidence.
* [ ] Reflection/dynamic import edges use correct reasons and mark `confidence = medium` (or high where youre certain).
* [ ] Unit tests check that for a known source file, the resulting edges contain expected `reason`, `evidence`, and `rule_id`.
---
### 2.3 Binary / container analyzers
**Owner:** Binary analysis / SCA team
**Tasks:**
1. Map binary features to reasons:
* Symbol relocations + PLT/GOT edges → `symbol_relocation` or `plt_got_resolution`
* LD_PRELOAD or injection edges → `ld_preload_injection`
2. Implement factory:
```csharp
public static class BinaryEdgeFactory
{
private const string DetectorName = "StellaOps.Scanner.Binary@1.0.0";
public static GraphEdge SymbolRelocation(
string fromSymbol,
string toSymbol,
string binaryPath,
string section,
string relocationName
)
{
var evidence = new[]
{
$"{binaryPath}::{section}:{relocationName}"
};
var via = EdgeViaFactory.Create(
EdgeReason.SymbolRelocation,
evidence,
DetectorName,
"BIN-RELOC-101",
EdgeConfidence.High
);
return new GraphEdge(new EdgeId(fromSymbol, toSymbol), via);
}
}
```
3. Wire up all binary edge creation to use this.
**Acceptance criteria:**
* [ ] For a test binary with a known relocation, edges include `reason = symbol_relocation` and section/symbol in evidence.
* [ ] No binary edge is created without `via`.
---
## 3. Storage & migrations
This depends on your backing store, but the pattern is similar.
### 3.1 Relational (SQL) example
**Owner:** Data / infra team
**Tasks:**
1. Add columns:
```sql
ALTER TABLE graph_edges
ADD COLUMN via_reason VARCHAR(64) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_evidence JSONB NOT NULL DEFAULT '[]'::jsonb,
ADD COLUMN via_detector VARCHAR(255) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_rule_id VARCHAR(128) NOT NULL DEFAULT 'unknown',
ADD COLUMN via_confidence VARCHAR(16) NOT NULL DEFAULT 'low';
```
2. Update ORM model:
```csharp
public class EdgeEntity
{
public string From { get; set; } = default!;
public string To { get; set; } = default!;
public string ViaReason { get; set; } = "unknown";
public string[] ViaEvidence { get; set; } = Array.Empty<string>();
public string ViaDetector { get; set; } = "unknown";
public string ViaRuleId { get; set; } = "unknown";
public string ViaConfidence { get; set; } = "low";
}
```
3. Add mapping to domain `GraphEdge`:
```csharp
public static GraphEdge ToDomain(this EdgeEntity e)
{
var via = new EdgeVia(
Reason: Enum.TryParse<EdgeReason>(e.ViaReason, true, out var r) ? r : EdgeReason.Unknown,
Evidence: e.ViaEvidence,
Provenance: new EdgeProvenance(
Detector: e.ViaDetector,
RuleId: e.ViaRuleId,
Confidence: Enum.TryParse<EdgeConfidence>(e.ViaConfidence, true, out var c) ? c : EdgeConfidence.Low
)
);
return new GraphEdge(new EdgeId(e.From, e.To), via);
}
```
4. **Backfill existing data** (optional but recommended):
* For edges with a known “type” column, map to bestfit `reason`.
* If you cant infer: set `reason = unknown`, `confidence = low`, `detector = "backfill@<version>"`.
**Acceptance criteria:**
* [ ] DB migration runs cleanly in staging and prod.
* [ ] No existing reader breaks: default values keep queries functioning.
* [ ] Edge roundtrip (domain → DB → API JSON) retains `via` fields correctly.
---
## 4. API & service layer
**Owner:** API / service team
**Tasks:**
1. Wire domain model → DTOs:
```csharp
public static GraphEdgeDto ToDto(this GraphEdge edge)
{
return new GraphEdgeDto
{
From = edge.Id.From,
To = edge.Id.To,
Via = new EdgeViaDto
{
Reason = edge.Via.Reason.ToString().ToSnakeCaseLower(), // e.g. "static_call"
Evidence = edge.Via.Evidence.ToArray(),
Provenance = new EdgeProvenanceDto
{
Detector = edge.Via.Provenance.Detector,
RuleId = edge.Via.Provenance.RuleId,
Confidence = edge.Via.Provenance.Confidence.ToString().ToLowerInvariant()
}
}
};
}
```
2. If you accept edges via API (internal services), validate:
* `reason` must be one of the known values; otherwise reject or coerce to `unknown`.
* `evidence` length ≤ 3.
* Trim whitespace and limit each evidence string length (e.g. 256 chars).
3. Versioning:
* Introduce `/v2/graph/paths` (or similar) that guarantees `via`.
* Keep `/v1/...` unchanged or mark deprecated.
**Acceptance criteria:**
* [ ] Path API returns `via.reason` and `via.evidence` for all edges in new endpoints.
* [ ] Invalid reason strings are rejected or converted to `unknown` with a log.
* [ ] Integration tests cover full flow: repo → scanner → DB → API → JSON.
---
## 5. UI: make paths auditorfriendly
**Owner:** Frontend team
**Tasks:**
1. **Path details UI**:
For each edge in the vulnerability path table:
* Show a **“Reason” column** with a small pill:
* `static_call` → “Static call”
* `declared_dependency` → “Declared dependency”
* etc.
* Below or on hover, show **primary evidence** (first evidence string).
2. **Edge details panel** (drawer/modal):
When user clicks an edge:
* Show:
* From → To (symbols/packages)
* Reason (with friendly description per enum)
* Evidence list (each on its own line)
* Detector, rule id, confidence
3. **Filtering & sorting (optional but powerful)**:
* Filter edges by `reason` (multiselect).
* Filter by `confidence` (e.g. show only high/medium).
* This helps auditors quickly isolate more speculative edges.
4. **UX text / glossary**:
* Add a small “?” tooltip that links to a glossary explaining each reason type in human language.
**Acceptance criteria:**
* [ ] For a given vulnerability, the path view shows a “Reason” column per edge.
* [ ] Clicking an edge reveals all evidence and provenance information.
* [ ] UX has a glossary/tooltip explaining what each reason means in plain English.
---
## 6. Testing strategy
**Owner:** QA + each feature team
### 6.1 Unit tests
* **Factories**: verify correct mapping from input to `EdgeVia`:
* Reason set correctly.
* Evidence trimmed, max 3.
* Confidence matches rubric (high for relocations, medium for heuristic imports, etc.).
* **Serialization**: `EdgeVia` → JSON and back.
### 6.2 Integration tests
Set up **small fixtures**:
1. **Simple dependency project**:
* Example: Python project with `requirements.txt``requests``urllib3`.
* Expected edges:
* App → requests: `declared_dependency`, evidence includes `requirements.txt`.
* requests → urllib3: `declared_dependency`, plus static call edges.
2. **Dynamic import case**:
* A module using `importlib.import_module("mod")`.
* Ensure edge is `dynamic_import` with `confidence = medium`.
3. **Binary edge case**:
* Test ELF with known symbol relocation.
* Ensure an edge with `reason = symbol_relocation` exists.
### 6.3 Endtoend tests
* Run full scan on a sample repo and:
* Hit path API.
* Assert every edge has nonnull `via` fields.
* Spot check a few known edges for exact `reason` and evidence.
**Acceptance criteria:**
* [ ] Automated tests fail if any edge is emitted without `via`.
* [ ] Coverage includes at least one example for each `EdgeReason` you support.
---
## 7. Observability, guardrails & rollout
### 7.1 Metrics & logging
**Owner:** Observability / platform
**Tasks:**
* Emit metrics:
* `% edges with reason != unknown`
* Count by `reason` and `confidence`
* Log warnings when:
* Edge is emitted with `reason = unknown`.
* Evidence is empty for a nonunknown reason.
**Acceptance criteria:**
* [ ] Dashboards showing distribution of edge reasons over time.
* [ ] Alerts if `unknown` reason edges exceed a threshold (e.g. >5%).
---
### 7.2 Rollout plan
**Owner:** PM + tech leads
**Steps:**
1. **Phase 1 Darklaunch metadata:**
* Start generating & storing `via` for new scans.
* Keep UI unchanged.
* Monitor metrics, unknown ratio, and storage overhead.
2. **Phase 2 Enable for internal users:**
* Toggle UI on (feature flag for internal / beta users).
* Collect feedback from security engineers and auditors.
3. **Phase 3 General availability:**
* Enable UI for all.
* Update customerfacing documentation & audit guides.
---
### 7.3 Documentation
**Owner:** Docs / PM
* Short **“Why this edge exists”** section in:
* Product docs (for customers).
* Internal runbooks (for support & SEs).
* Include:
* Table of reasons → human descriptions.
* Examples of path explanations (e.g., “This edge exists because `app` declares `urllib3` in `requirements.txt` and calls it in `client.py:42`”).
---
## 8. Readytouse ticket breakdown
You can almost copypaste these into your tracker:
1. **Shared**: Define EdgeReason, EdgeVia & EdgeProvenance in shared library, plus EdgeViaFactory.
2. **SBOM**: Use EdgeFactory.DeclaredDependency for all manifestgenerated edges.
3. **Source**: Wire all callgraph edges to SourceEdgeFactory (static_call, dynamic_import, reflection_call, plugin_discovery, taint_propagation).
4. **Binary**: Wire relocations/PLT/GOT edges to BinaryEdgeFactory (symbol_relocation, plt_got_resolution, ld_preload_injection).
5. **Data**: Add via_* columns/properties to graph_edges storage and map to/from domain.
6. **API**: Extend graph path DTOs to include `via`, update OpenAPI, and implement /v2 endpoints if needed.
7. **UI**: Show edge reason, evidence, and provenance in vulnerability path screens and add filters.
8. **Testing**: Add unit, integration, and endtoend tests ensuring every edge has nonnull `via`.
9. **Observability**: Add metrics and logs for edge reasons and unknown rates.
10. **Docs & rollout**: Write glossary + auditor docs and plan staged rollout.
---
If you tell me a bit about your current storage (e.g., Neo4j vs SQL) and the services names, I can tailor this into an even more literal set of code snippets and migrations to match your stack exactly.

View File

@@ -0,0 +1,819 @@
Heres a crisp, readytoship concept you can drop into StellaOps: an **Unknowns Registry** that captures ambiguous scanner artifacts (stripped binaries, unverifiable packages, orphaned PURLs, missing digests) and treats them as firstclass citizens with probabilistic severity and trustdecay—so you stay transparent without blocking delivery.
### What this solves (in plain terms)
* **No silent drops:** every “cant verify / cant resolve” is tracked, not discarded.
* **Quantified risk:** unknowns still roll into a portfoliolevel risk number with confidence intervals.
* **Trust over time:** stale unknowns get *riskier* the longer they remain unresolved.
* **Client confidence:** visibility + trajectory (are unknowns shrinking?) becomes a maturity signal.
### Core data model (CycloneDX/SPDX compatible, attaches to your SBOM spine)
```yaml
UnknownArtifact:
id: urn:stella:unknowns:<uuid>
observedAt: <RFC3339>
origin:
source: scanner|ingest|runtime
feed: <name/version>
evidence: [ filePath, containerDigest, buildId, sectionHints ]
identifiers:
purl?: <string> # orphan/incomplete PURL allowed
hash?: <sha256|null> # missing digest allowed
cpe?: <string|null>
classification:
type: binary|library|package|script|config|other
reason: stripped_binary|missing_signature|no_feed_match|ambiguous_name|checksum_mismatch|other
metrics:
baseUnkScore: 0..1
confidence: 0..1 # model confidence in the *score*
trust: 0..1 # provenance trust (sig/attest, feed quality)
decayPolicyId: <ref>
resolution:
status: unresolved|suppressed|mitigated|confirmed-benign|confirmed-risk
updatedAt: <RFC3339>
notes: <text>
links:
scanId: <ref>
componentId?: <ref to SBOM component if later mapped>
attestations?: [ dsse, in-toto, rekorRef ]
```
### Scoring (simple, explainable, deterministic)
* **Unknown Risk (UR):**
`UR_t = clamp( (B * (1 + A)) * D_t * (1 - T) , 0, 1 )`
* `B` = `baseUnkScore` (heuristics: file entropy, section hints, ELF flags, import tables, size, location)
* `A` = **Environment Amplifier** (runtime proximity: container entrypoint? PID namespace? network caps?)
* `T` = **Trust** (sig/attest/registry reputation/feed pedigree normalized to 0..1)
* `D_t` = **Trustdecay multiplier** over time `t`:
* Linear: `D_t = 1 + k * daysOpen` (e.g., `k = 0.01`)
* or Exponential: `D_t = e^(λ * daysOpen)` (e.g., `λ = 0.005`)
* **Portfolio rollup:** use **P90 of UR_t** across images + **sum of topN UR_t** to avoid dilution.
### Policies & SLOs
* **SLO:** *Unknowns burndown* ≤ X% weekoverweek; *Median age* ≤ Y days.
* **Gates:** block promotion when (a) any `UR_t ≥ 0.8`, or (b) more than `M` unknowns with age > `Z` days.
* **Suppressions:** require justification + expiry; suppression reduces `A` but does **not** zero `D_t`.
### Trustdecay policies (pluggable)
```yaml
DecayPolicy:
id: decay:default:v1
kind: linear|exponential|custom
params:
k: 0.01 # linear slope per day
cap: 2.0 # max multiplier
```
### Scanner hooks (where to emit Unknowns)
* **Binary scan:** stripped ELF/MachO/PE; missing buildID; abnormal sections; impossible symbol map.
* **Package map:** PURL inferred from path without registry proof; mismatched checksum; vendor fork detected.
* **Attestation:** DSSE missing / invalid; Sigstore chain unverifiable; Rekor entry not found.
* **Feeds:** component seen in runtime but absent from SBOM (or vice versa).
### Deterministic generation (for replay/audits)
* Include **Unknowns** in the **Scan Manifest** (your deterministic bundle): inputs, ruleset hash, feed hashes, lattice policy version, and the exact classifier thresholds that produced `B`, `A`, `T`. That lets you replay and reproduce UR_t byteforbyte during audits.
### API surface (StellaOps.Authority)
```
POST /unknowns/ingest # bulk ingest from Scanner/Vexer
GET /unknowns?imageDigest=… # list + filters (status, age, UR buckets)
PATCH /unknowns/{id}/resolve # set status, add evidence, set suppression (with expiry)
GET /unknowns/stats # burn-downs, age histograms, P90 UR_t, top-N contributors
```
### UI slices (Trust Algebra Studio)
* **Risk ribbon:** Unknowns count, P90 UR_t, median age, trend sparkline.
* **Aging board:** columns by age buckets; cards show reason, UR_t, `T`, decay policy, evidence.
* **Whatif slider:** adjust `k`/`λ` and see retroactive effect on release readiness (deterministic preview).
* **Explainability panel:** show `B`, `A`, `T`, `D_t` factors with succinct evidence (e.g., “ELF stripped; no .symtab; no Sigstore; runtime hits PID 1 → A=0.2; trust=0.1; day 17 → D=1.17”).
### Heuristics for `baseUnkScore (B)` (portable across ELF/PE/MachO)
* Section/segment anomalies; entropy outliers; import tables linking to risky APIs; executable heap/stack flags.
* Location & role (PATH proximity to entrypoint, init scripts).
* Size/type priors (tiny droppers vs fat libs).
* Knownunknown patterns (packer signatures, `UPX` traces without unpack attest).
### Governance & reporting
* **Weekly Unknowns Report:** totals, burndown, median age, “top unresolved origins” (by registry / base image), and “timetofirstevidence.”
* **Vendor scorecards:** if a supplier causes recurring unknowns, reflect it in **Adaptive Trust Economics**.
### Minimal implementation slice (2 sprints)
1. **Sprint A (backend + ingest):** model + tables, ingest endpoint, deterministic manifest hook, basic scoring (linear decay), list/query API, rollup metrics.
2. **Sprint B (UI + policy):** Unknowns board, whatif slider, suppress/resolve workflow w/ expiry, weekly report, policy gates in CI.
---
If you want, I can generate:
* the exact **PostgreSQL schema + indices** (including age/UR materialized views),
* **.NET 10** DTOs, handlers, and a **deterministic scoring library** with unit tests,
* a **Grafana/ClickHouse** dashboard spec for burndown and P90 tracking,
* and a onepage clientfacing explainer your sales team can use.
Perfect, lets turn the concept into a concrete, devready implementation plan.
Ill assume:
* You already have:
* A scanner pipeline (binaries, SBOMs, attestations)
* A backend service (StellaOps.Authority)
* A UI (Trust Algebra Studio)
* Observability (OpenTelemetry, ClickHouse/Presto)
You can adapt naming and tech stack as needed.
---
## 0. Scope & success criteria
**Goals**
1. Persist all “unknown-ish” scanner findings (stripped binaries, unverifiable PURLs, missing digests, etc.) as firstclass entities.
2. Compute a deterministic **Unknown Risk (UR)** per artifact and roll it up per image/application.
3. Apply **trustdecay** over time and expose burndown metrics.
4. Provide UI workflows to triage, suppress, and resolve unknowns.
5. Enforce release gates based on unknown risk and age.
**Nongoals (for v1)**
* No full ML; use deterministic heuristics + tunable weights.
* No crossorg multitenant policy — single org/single policy set.
* No perdeveloper responsibility/assignment yet (can add later).
---
## 1. Architecture & components
### 1.1 New/updated components
1. **Unknowns Registry (backend submodule)**
* Lives in your existing backend (e.g., `StellaOps.Authority.Unknowns`).
* Owns DB schema, scoring logic, and API.
2. **Scanner integration**
* Extend `StellaOps.Scanner` (and/or `Vexer`) to emit “unknown” findings into the registry via HTTP or message bus.
3. **UI: Unknowns in Trust Algebra Studio**
* New section/tab: “Unknowns” under each image/app.
* Global “Unknowns board” for portfolio view.
4. **Analytics & jobs**
* Periodic job to recompute trustdecay & UR.
* Weekly report generator (e.g., pushing into ClickHouse, Slack, or email).
---
## 2. Data model (DB schema)
Use relational DB; heres a concrete schema you can translate into migrations.
### 2.1 Tables
#### `unknown_artifacts`
Represents the current state of each unknown.
* `id` (UUID, PK)
* `created_at` (timestamp)
* `updated_at` (timestamp)
* `first_observed_at` (timestamp, NOT NULL)
* `last_observed_at` (timestamp, NOT NULL)
* `origin_source` (enum: `scanner`, `runtime`, `ingest`)
* `origin_feed` (text) e.g., `binary-scanner@1.4.3`
* `origin_scan_id` (UUID / text) foreign key to `scan_runs` if you have it
* `image_digest` (text, indexed) to tie to container/image
* `component_id` (UUID, nullable) SBOM component when later mapped
* `file_path` (text, nullable)
* `build_id` (text, nullable) ELF/Mach-O/PE build ID if any
* `purl` (text, nullable)
* `hash_sha256` (text, nullable)
* `cpe` (text, nullable)
* `classification_type` (enum: `binary`, `library`, `package`, `script`, `config`, `other`)
* `classification_reason` (enum:
`stripped_binary`, `missing_signature`, `no_feed_match`,
`ambiguous_name`, `checksum_mismatch`, `other`)
* `status` (enum:
`unresolved`, `suppressed`, `mitigated`, `confirmed_benign`, `confirmed_risk`)
* `status_changed_at` (timestamp)
* `status_changed_by` (text / user-id)
* `notes` (text)
* `decay_policy_id` (FK → `decay_policies`)
* `base_unk_score` (double, 0..1)
* `env_amplifier` (double, 0..1)
* `trust` (double, 0..1)
* `current_decay_multiplier` (double)
* `current_ur` (double, 0..1) Unknown Risk at last recompute
* `current_confidence` (double, 0..1) confidence in `current_ur`
* `is_deleted` (bool) soft delete
**Indexes**
* `idx_unknown_artifacts_image_digest_status`
* `idx_unknown_artifacts_status_created_at`
* `idx_unknown_artifacts_current_ur`
* `idx_unknown_artifacts_last_observed_at`
#### `unknown_artifact_events`
Append-only event log for auditable changes.
* `id` (UUID, PK)
* `unknown_artifact_id` (FK → `unknown_artifacts`)
* `created_at` (timestamp)
* `actor` (text / user-id / system)
* `event_type` (enum:
`created`, `reobserved`, `status_changed`, `note_added`,
`metrics_recomputed`, `linked_component`, `suppression_applied`, `suppression_expired`)
* `payload` (JSONB) diff or eventspecific details
Index: `idx_unknown_artifact_events_artifact_id_created_at`
#### `decay_policies`
Defines how trustdecay works.
* `id` (text, PK) e.g., `decay:default:v1`
* `kind` (enum: `linear`, `exponential`)
* `param_k` (double, nullable) for linear: slope
* `param_lambda` (double, nullable) for exponential
* `cap` (double, default 2.0)
* `description` (text)
* `is_default` (bool)
#### `unknown_suppressions`
Optional; can also reuse `unknown_artifacts.status` but separate table lets you have multiple suppressions over time.
* `id` (UUID, PK)
* `unknown_artifact_id` (FK)
* `created_at` (timestamp)
* `created_by` (text)
* `reason` (text)
* `expires_at` (timestamp, nullable)
* `active` (bool)
Index: `idx_unknown_suppressions_artifact_active_expires_at`
#### `unknown_image_rollups`
Precomputed rollups per image (for fast dashboards/gates).
* `id` (UUID, PK)
* `image_digest` (text, indexed)
* `computed_at` (timestamp)
* `unknown_count_total` (int)
* `unknown_count_unresolved` (int)
* `unknown_count_high_ur` (int) e.g., UR ≥ 0.8
* `p50_ur` (double)
* `p90_ur` (double)
* `top_n_ur_sum` (double)
* `median_age_days` (double)
---
## 3. Scoring engine implementation
Create a small, deterministic scoring library so the same code can be used in:
* Backend ingest path (for immediate UR)
* Batch recompute job
* “Whatif” UI simulations (optionally via stateless API)
### 3.1 Data types
Define a core model, e.g.:
```ts
type UnknownMetricsInput = {
baseUnkScore: number; // B
envAmplifier: number; // A
trust: number; // T
daysOpen: number; // t
decayPolicy: {
kind: "linear" | "exponential";
k?: number;
lambda?: number;
cap: number;
};
};
type UnknownMetricsOutput = {
decayMultiplier: number; // D_t
unknownRisk: number; // UR_t
};
```
### 3.2 Algorithm
```ts
function computeDecayMultiplier(
daysOpen: number,
policy: DecayPolicy
): number {
if (policy.kind === "linear") {
const raw = 1 + (policy.k ?? 0) * daysOpen;
return Math.min(raw, policy.cap);
}
if (policy.kind === "exponential") {
const lambda = policy.lambda ?? 0;
const raw = Math.exp(lambda * daysOpen);
return Math.min(raw, policy.cap);
}
return 1;
}
function computeUnknownRisk(input: UnknownMetricsInput): UnknownMetricsOutput {
const { baseUnkScore: B, envAmplifier: A, trust: T, daysOpen, decayPolicy } = input;
const D_t = computeDecayMultiplier(daysOpen, decayPolicy);
const raw = (B * (1 + A)) * D_t * (1 - T);
const unknownRisk = Math.max(0, Math.min(raw, 1)); // clamp 0..1
return { decayMultiplier: D_t, unknownRisk };
}
```
### 3.3 Heuristics for `B`, `A`, `T`
Implement these as pure functions with configurationdriven weights:
* `B` (base unknown score):
* Start from prior: by `classification_type` (binary > library > config).
* Adjust up for:
* Stripped binary (no symbols, high entropy)
* Suspicious segments (executable stack/heap)
* Known packer signatures (UPX, etc.)
* Adjust down for:
* Large, wellknown dependency path (`/usr/lib/...`)
* Known safe signatures (if partially known).
* `A` (environment amplifier):
* +0.2 if artifact is part of container entrypoint (PID 1).
* +0.1 if file is in a PATH dir (e.g., `/usr/local/bin`).
* +0.1 if the runtime has network capabilities/capabilities flags.
* Cap at 0.5 for v1.
* `T` (trust):
* Start at 0.5.
* +0.3 if registry/signature/attestation chain verified.
* +0.1 if source registry is “trusted vendor list”.
* 0.3 if checksum mismatch or feed conflict.
* Clamp 0..1.
Store the raw factors (`B`, `A`, `T`) on the artifact for transparency and later replays.
---
## 4. Scanner integration
### 4.1 Emission format (from scanner → backend)
Define a minimal ingestion contract (JSON over HTTP or a message):
```jsonc
{
"scanId": "urn:scan:1234",
"imageDigest": "sha256:abc123...",
"observedAt": "2025-11-27T12:34:56Z",
"unknowns": [
{
"externalId": "scanner-unique-id-1",
"originSource": "scanner",
"originFeed": "binary-scanner@1.4.3",
"filePath": "/usr/local/bin/stripped",
"buildId": null,
"purl": null,
"hashSha256": "aa...",
"cpe": null,
"classificationType": "binary",
"classificationReason": "stripped_binary",
"rawSignals": {
"entropy": 7.4,
"hasSymbols": false,
"isEntrypoint": true,
"inPathDir": true
}
}
]
}
```
The backend maps `rawSignals``B`, `A`, `T`.
### 4.2 Idempotency
* Define uniqueness key on `(image_digest, file_path, hash_sha256)` for v1.
* On ingest:
* If an artifact exists:
* Update `last_observed_at`.
* Recompute age (`now - first_observed_at`) and UR.
* Add `reobserved` event.
* If not:
* Insert new row with `first_observed_at = observedAt`.
### 4.3 HTTP endpoint
`POST /internal/unknowns/ingest`
* Auth: internal service token.
* Returns perunknown mapping to internal `id` and computed UR.
Error handling:
* If invalid payload → 400 with list of errors.
* Partial failure: process valid unknowns, return `failedUnknowns` array with reasons.
---
## 5. Backend API for UI & CI
### 5.1 List unknowns
`GET /unknowns`
Query params:
* `imageDigest` (optional)
* `status` (optional multi: unresolved, suppressed, etc.)
* `minUr`, `maxUr` (optional)
* `maxAgeDays` (optional)
* `page`, `pageSize`
Response:
```jsonc
{
"items": [
{
"id": "urn:stella:unknowns:uuid",
"imageDigest": "sha256:...",
"filePath": "/usr/local/bin/stripped",
"classificationType": "binary",
"classificationReason": "stripped_binary",
"status": "unresolved",
"firstObservedAt": "...",
"lastObservedAt": "...",
"ageDays": 17,
"baseUnkScore": 0.7,
"envAmplifier": 0.2,
"trust": 0.1,
"decayPolicyId": "decay:default:v1",
"decayMultiplier": 1.17,
"currentUr": 0.84,
"currentConfidence": 0.8
}
],
"total": 123
}
```
### 5.2 Get single unknown + event history
`GET /unknowns/{id}`
Include:
* The artifact.
* Latest metrics.
* Recent events (with pagination).
### 5.3 Update status / suppression
`PATCH /unknowns/{id}`
Body options:
```jsonc
{
"status": "suppressed",
"notes": "Reviewed; internal diagnostics binary.",
"suppression": {
"expiresAt": "2025-12-31T00:00:00Z"
}
}
```
Backend:
* Validates transition (cannot unsuppress to “unresolved” without event).
* Writes to `unknown_suppressions`.
* Writes `status_changed` + `suppression_applied` events.
### 5.4 Image rollups
`GET /images/{imageDigest}/unknowns/summary`
Response:
```jsonc
{
"imageDigest": "sha256:...",
"computedAt": "...",
"unknownCountTotal": 40,
"unknownCountUnresolved": 30,
"unknownCountHighUr": 4,
"p50Ur": 0.35,
"p90Ur": 0.82,
"topNUrSum": 2.4,
"medianAgeDays": 9
}
```
This is what CI and UI will mostly query.
---
## 6. Trustdecay job & rollup computation
### 6.1 Periodic recompute job
Schedule (e.g., every hour):
1. Fetch `unknown_artifacts` where:
* `status IN ('unresolved', 'suppressed', 'mitigated')`
* `last_observed_at >= now() - interval '90 days'` (tunable)
2. Compute `daysOpen = now() - first_observed_at`.
3. Compute `D_t` and `UR_t` with scoring library.
4. Update `unknown_artifacts.current_ur`, `current_decay_multiplier`.
5. Append `metrics_recomputed` event (batch size threshold, e.g., only when UR changed > 0.01).
### 6.2 Rollup job
Every X minutes:
1. For each `image_digest` with active unknowns:
* Compute:
* `unknown_count_total`
* `unknown_count_unresolved` (`status = unresolved`)
* `unknown_count_high_ur` (UR ≥ threshold)
* `p50` / `p90` UR (use DB percentile or compute in app)
* `top_n_ur_sum` (sum of top 5 UR)
* `median_age_days`
2. Upsert into `unknown_image_rollups`.
---
## 7. CI / promotion gating
Expose a simple policy evaluation API for CI and deploy pipelines.
### 7.1 Policy definition (config)
Example YAML:
```yaml
unknownsPolicy:
blockIf:
- kind: "anyUrAboveThreshold"
threshold: 0.8
- kind: "countAboveAge"
maxCount: 5
ageDays: 14
warnIf:
- kind: "unknownCountAbove"
maxCount: 50
```
### 7.2 Policy evaluation endpoint
`GET /policy/unknowns/evaluate?imageDigest=sha256:...`
Response:
```jsonc
{
"imageDigest": "sha256:...",
"result": "block", // "ok" | "warn" | "block"
"reasons": [
{
"kind": "anyUrAboveThreshold",
"detail": "1 unknown with UR>=0.8 (max allowed: 0)"
}
],
"summary": {
"unknownCountUnresolved": 30,
"p90Ur": 0.82,
"medianAgeDays": 17
}
}
```
CI can decide to fail build/deploy based on `result`.
---
## 8. UI implementation (Trust Algebra Studio)
### 8.1 Image detail page: “Unknowns” tab
Components:
1. **Header metrics ribbon**
* Unknowns unresolved, p90 UR, median age, weekly trend sparkline.
* Fetch from `/images/{digest}/unknowns/summary`.
2. **Unknowns table**
* Columns:
* Status pill
* UR (with color + tooltip showing `B`, `A`, `T`, `D_t`)
* Classification type/reason
* File path
* Age
* Last observed
* Filters:
* Status, UR range, age range, reason, type.
3. **Row drawer / detail panel**
* Show:
* All core fields.
* Evidence:
* origin (scanner, feed, runtime)
* raw signals (entropy, sections, etc)
* SBOM component link (if any)
* Timeline (events list)
* Actions:
* Change status (unresolved → suppressed/mitigated/confirmed).
* Add note.
* Set/extend suppression expiry.
### 8.2 Global “Unknowns board”
Goals:
* Portfolio view; triage across many images.
Features:
* Filters by:
* Team/application/service
* Time range for first observed
* UR bucket (00.3, 0.30.6, 0.61)
* Cards/rows per image:
* Unknown counts, p90 UR, median age.
* Trend of unknown count (last N weeks).
* Click through to imagedetail tab.
### 8.3 “Whatif” slider (optional v1.1)
On an image or org-level:
* Slider(s) to visualize effect of:
* `k` / `lambda` change (decay speed).
* Trust baseline changes (simulate better attestations).
* Implement by calling a stateless endpoint:
* `POST /unknowns/what-if` with:
* Current unknowns list IDs
* Proposed decay policy
* Returns recalculated URs and hypothetical gate result (but does **not** persist).
---
## 9. Observability & analytics
### 9.1 Metrics
Emit structured events/metrics (OpenTelemetry, etc.):
* Counters:
* `unknowns_ingested_total` (labels: `source`, `classification_type`, `reason`)
* `unknowns_resolved_total` (labels: `status`)
* Gauges:
* `unknowns_unresolved_count` per image/service.
* `unknowns_p90_ur` per image/service.
* `unknowns_median_age_days`.
### 9.2 Weekly report generator
Batch job:
1. Compute, per org or team:
* Total unknowns.
* New unknowns this week.
* Resolved unknowns this week.
* Median age.
* Top 10 images by:
* Highest p90 UR.
* Largest number of longlived unknowns (> X days).
2. Persist into analytics store (ClickHouse) + push into:
* Slack channel / email with a short plaintext summary and link to UI.
---
## 10. Security & compliance
* Ensure all APIs require authentication & proper scopes:
* Scanner ingest: internal service token only.
* UI APIs: user identity + RBAC (e.g., team can only see their images).
* Audit log:
* `unknown_artifact_events` must be immutable and queryable by compliance teams.
* PII:
* Avoid storing user PII in notes; if necessary, apply redaction.
---
## 11. Suggested delivery plan (sprints/epics)
### Sprint 1 Foundations & ingest path
* [ ] DB migrations: `unknown_artifacts`, `unknown_artifact_events`, `decay_policies`.
* [ ] Implement scoring library (`B`, `A`, `T`, `UR_t`, `D_t`).
* [ ] Implement `/internal/unknowns/ingest` endpoint with idempotency.
* [ ] Extend scanner to emit unknowns and integrate with ingest.
* [ ] Basic `GET /unknowns?imageDigest=...` API.
* [ ] Seed `decay:default:v1` policy.
**Exit criteria:** Unknowns created and UR computed from real scans; queryable via API.
---
### Sprint 2 Decay, rollups, and CI hook
* [ ] Implement periodic job to recompute decay & UR.
* [ ] Implement rollup job + `unknown_image_rollups` table.
* [ ] Implement `GET /images/{digest}/unknowns/summary`.
* [ ] Implement policy evaluation endpoint for CI.
* [ ] Wire CI to block/warn based on policy.
**Exit criteria:** CI gate can fail a build due to highrisk unknowns; rollups visible via API.
---
### Sprint 3 UI (Unknowns tab + board)
* [ ] Image detail “Unknowns” tab:
* Metrics ribbon, table, filters.
* Row drawer with evidence & history.
* [ ] Global “Unknowns board” page.
* [ ] Integrate with APIs.
* [ ] Add basic “explainability tooltip” for UR.
**Exit criteria:** Security team can triage unknowns via UI; product teams can see their exposure.
---
### Sprint 4 Suppression workflow & reporting
* [ ] Implement `PATCH /unknowns/{id}` + suppression rules & expiries.
* [ ] Extend periodic jobs to autoexpire suppressions.
* [ ] Weekly unknowns report job → analytics + Slack/email.
* [ ] Add “trend” sparklines and unknowns burndown in UI.
**Exit criteria:** Unknowns can be suppressed with justification; org gets weekly burndown trends.
---
If youd like, I can next:
* Turn this into concrete tickets (Jira-style) with story points and acceptance criteria, or
* Generate example migration scripts (SQL) and API contract files (OpenAPI snippet) that your devs can copypaste.

View File

@@ -0,0 +1,766 @@
Heres a quick, practical headsup on publishing attestations to Sigstore/Rekor without pain, plus a dropin pattern you can adapt today.
---
## Why this matters (plain English)
* **Rekor** is a public transparency log for your build proofs.
* **DSSE attestations** (e.g., intoto, SLSA) are uploaded **in full**—not streamed—so big blobs hit **payload limits** and fail.
* Thousands of tiny attestations also hurt you: **API overhead, retries, and throttling** skyrocket.
The sweet spot: **chunk your evidence sensibly**, keep each DSSE envelope small enough for Rekor, and add **retry + resume** so partial batches dont nuke your whole publish step.
---
## Design rules of thumb
* **Target envelope size:** keep each DSSE (base64encoded) comfortably **<12MB** (tunable per your CI).
* **Shard by artifact + section:** e.g., split SBOMs by package namespace, split provenance by step/log segments, split test evidence by suite.
* **Stable chunking keys:** deterministic chunk IDs (e.g., `artifactDigest + section + seqNo`) so retries can **idempotently** republish.
* **Batch with backoff:** publish N envelopes, exponential backoff on 429/5xx, **resume from last success**.
* **Record mapping:** keep a **local index**: `chunkId → rekorUUID`, so you can later reconstruct the full evidence set.
* **Verify before delete:** only discard local chunk files **after** Rekor inclusion proof is verified.
* **Observability:** metrics for envelopes/s, bytes/s, retry count, and final inclusion rate.
---
## Minimal workflow (pseudo)
1. **Produce evidence** split into chunks
2. **Wrap each chunk in DSSE** (sign once per chunk)
3. **Publish to Rekor** with retry + idempotency
4. **Store rekor UUID + inclusion proof**
5. **Emit a manifest** that lists all chunk IDs for downstream recomposition
---
## C# sketch (fits .NET 10 style)
```csharp
public sealed record ChunkRef(string Artifact, string Section, int Part, string ChunkId);
public sealed record PublishResult(ChunkRef Ref, string RekorUuid, string InclusionHash);
public interface IChunker {
IEnumerable<(ChunkRef Ref, ReadOnlyMemory<byte> Payload)> Split(ArtifactEvidence evidence, int targetBytes);
}
public interface IDsseSigner {
// Returns serialized DSSE envelope (JSON) ready to upload
byte[] Sign(ReadOnlySpan<byte> payload, string payloadType);
}
public interface IRekorClient {
// Idempotent publish: returns existing UUID if duplicate body digest
Task<(string uuid, string inclusionHash)> UploadAsync(ReadOnlySpan<byte> dsseEnvelope, CancellationToken ct);
}
public sealed class Publisher {
private readonly IChunker _chunker;
private readonly IDsseSigner _signer;
private readonly IRekorClient _rekor;
private readonly ICheckpointStore _store; // chunkId -> (uuid, inclusionHash)
public Publisher(IChunker c, IDsseSigner s, IRekorClient r, ICheckpointStore st) =>
(_chunker, _signer, _rekor, _store) = (c, s, r, st);
public async IAsyncEnumerable<PublishResult> PublishAsync(
ArtifactEvidence ev, int targetBytes, string payloadType,
[System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default)
{
foreach (var (refInfo, chunk) in _chunker.Split(ev, targetBytes)) {
if (_store.TryGet(refInfo.ChunkId, out var cached)) {
yield return new PublishResult(refInfo, cached.uuid, cached.inclusionHash);
continue;
}
var envelope = _signer.Sign(chunk.Span, payloadType);
// retry with jitter/backoff
var delay = TimeSpan.FromMilliseconds(200);
for (int attempt = 1; ; attempt++) {
try {
var (uuid, incl) = await _rekor.UploadAsync(envelope, ct);
_store.Put(refInfo.ChunkId, uuid, incl);
yield return new PublishResult(refInfo, uuid, incl);
break;
} catch (TransientHttpException) when (attempt < 6) {
await Task.Delay(delay + TimeSpan.FromMilliseconds(Random.Shared.Next(0, 250)), ct);
delay = TimeSpan.FromMilliseconds(Math.Min(delay.TotalMilliseconds * 2, 5000));
}
}
}
}
}
```
**Notes:**
* Implement `IChunker` so splits are **deterministic** (e.g., package groups of an SBOM or linebounded log slices).
* Make `IRekorClient.UploadAsync` **idempotent** by hashing the DSSE envelope and using Rekors response on duplicates.
* `ICheckpointStore` can be a local SQLite/JSON file in CI artifacts; export it with your build.
---
## What to chunk (practical presets)
* **SBOM (CycloneDX/SPDX):** per dependency namespace/layer; keep each file ~300800KB before DSSE.
* **Provenance (intoto/SLSA):** one DSSE per build step or per 1050KB of logs/evidence.
* **Test proofs:** group per suite; avoid single megaJUnit JSONs.
---
## “Done” checklist
* [ ] Envelopes consistently under your Rekor size ceiling (leave 3040% headroom).
* [ ] Idempotent retry with resume (no duplicate spam).
* [ ] Local index mapping `chunkId → rekorUUID` stored in CI artifacts.
* [ ] Inclusion proofs verified and archived.
* [ ] A recomposition manifest that lists all chunk IDs for auditors.
If you want, I can tailor this to StellaOps (naming, namespaces, and your Rekor mirror strategy) and drop in a readytocompile module for your `.NET 10` solution.
Cool, lets turn that sketch into something your devs can actually pick up and build.
Ill lay this out like an implementation guide: architecture, project layout, percomponent specs, config, and a suggested rollout plan.
---
## 1. Objectives & constraints
**Primary goals**
* Publish DSSE attestations into Rekor:
* Avoid size limits (chunking).
* Avoid throttling (batching & retry).
* Ensure idempotency & resumability.
* Keep it **frameworkagnostic** inside `.NET 10` (can run in any CI).
* Make verification/auditing easy (manifest + inclusion proofs).
**Nonfunctional**
* Deterministic behavior: same inputs same chunk IDs & envelopes.
* Observable: metrics and logs for troubleshooting.
* Testable: clear seams/interfaces for mocking Rekor & signing.
---
## 2. Highlevel architecture
Core pipeline (per build / artifact):
1. **Evidence input** you pass in provenance/SBOM/test data as `ArtifactEvidence`.
2. **Chunker** splits oversized evidence into multiple chunks with stable IDs.
3. **DSSE Signer** wraps each chunk in a DSSE envelope.
4. **Rekor client** publishes envelopes to the Rekor log with retry/backoff.
5. **Checkpoint store** remembers which chunks were already published.
6. **Manifest builder** emits a manifest mapping artifact all Rekor entries.
Text diagram:
```text
[ArtifactEvidence]
|
v
IChunker ---> [ChunkRef + Payload] x N
|
v
IDsseSigner ---> [DSSE Envelope] x N
|
v
IRekorClient (with retry & backoff)
|
v
ICheckpointStore <--> ManifestBuilder
|
v
[attestations_manifest.json] + inclusion proofs
```
---
## 3. Project & namespace layout
Example solution layout:
```text
src/
SupplyChain.Attestations.Core/
Chunking/
Signing/
Publishing/
Models/
Manifest/
SupplyChain.Attestations.Rekor/
RekorClient/
Models/
SupplyChain.Attestations.Cli/
Program.cs
Commands/ # e.g., publish-attestations
tests/
SupplyChain.Attestations.Core.Tests/
SupplyChain.Attestations.Rekor.Tests/
SupplyChain.Attestations.IntegrationTests/
```
You can of course rename to match your org.
---
## 4. Data models & contracts
### 4.1 Core domain models
```csharp
public sealed record ArtifactEvidence(
string ArtifactId, // e.g., image digest, package id, etc.
string ArtifactType, // "container-image", "nuget-package", ...
string ArtifactDigest, // canonical digest (sha256:...)
IReadOnlyList<EvidenceBlob> EvidenceBlobs // SBOM, provenance, tests, etc.
);
public sealed record EvidenceBlob(
string Section, // "sbom", "provenance", "tests", "logs"
string ContentType, // "application/json", "text/plain"
ReadOnlyMemory<byte> Content
);
public sealed record ChunkRef(
string ArtifactId,
string Section, // from EvidenceBlob.Section
int Part, // 0-based index
string ChunkId // stable identifier
);
```
**ChunkId generation rule (deterministic):**
```csharp
// Pseudo:
ChunkId = Base64Url( SHA256( $"{ArtifactDigest}|{Section}|{Part}" ) )
```
Store both `ChunkRef` and hashes in the manifest so its reproducible.
### 4.2 Rekor publication result
```csharp
public sealed record PublishResult(
ChunkRef Ref,
string RekorUuid,
string InclusionHash, // hash used for inclusion proof
string LogIndex // optional, if returned by Rekor
);
```
### 4.3 Manifest format
A single build emits `attestations_manifest.json`:
```jsonc
{
"schemaVersion": "1.0",
"buildId": "build-2025-11-27T12:34:56Z",
"artifact": {
"id": "my-app@sha256:abcd...",
"type": "container-image",
"digest": "sha256:abcd..."
},
"chunks": [
{
"chunkId": "aBcD123...",
"section": "sbom",
"part": 0,
"rekorUuid": "1234-5678-...",
"inclusionHash": "deadbeef...",
"logIndex": "42"
}
]
}
```
Define a C# model mirroring this and serialize with `System.Text.Json`.
---
## 5. Componentlevel design
### 5.1 Chunker
**Interface**
```csharp
public sealed record ChunkingOptions(
int TargetMaxBytes, // e.g., 800_000 bytes preDSSE
int HardMaxBytes // e.g., 1_000_000 bytes preDSSE
);
public interface IChunker
{
IEnumerable<(ChunkRef Ref, ReadOnlyMemory<byte> Payload)> Split(
ArtifactEvidence evidence,
ChunkingOptions options
);
}
```
**Behavior**
* For each `EvidenceBlob`:
* If `Content.Length <= TargetMaxBytes` 1 chunk.
* Else:
* Split on **logical boundaries** if possible:
* SBOM JSON: split by package list segments.
* Logs: split by line boundaries.
* Tests: split by test suite / file.
* If not easily splittable (opaque binary), hardchunk by byte window.
* Ensure **each chunk** respects `HardMaxBytes`.
* Generate `ChunkRef.Part` sequentially (0,1,2,…) per `(ArtifactId, Section)`.
* Generate `ChunkId` with the deterministic rule above.
**Implementation plan**
* Start with a **simple hardbyte chunker**:
* Always split at `TargetMaxBytes` boundaries.
* Add optional **formataware chunkers**:
* `SbomChunkerDecorator` detects JSON SBOM structure and splits on package groups.
* `LogChunkerDecorator` splits on lines.
* Use the decorator pattern or strategy pattern, all implementing `IChunker`.
---
### 5.2 DSSE signer
We abstract away how keys are managed.
**Interface**
```csharp
public interface IDsseSigner
{
// payload: raw bytes of the evidence chunk
// payloadType: DSSE payloadType string, e.g. "application/vnd.in-toto+json"
byte[] Sign(ReadOnlySpan<byte> payload, string payloadType);
}
```
**Responsibilities**
* Create DSSE envelope:
* `payloadType` from config (per section or global).
* `payload` base64url of chunk.
* `signatures` one or more signatures (key ID + signature bytes).
* Serialize to **JSON** as UTF8 `byte[]`.
**Implementation plan**
* Implement `KeyBasedDsseSigner`:
* Uses a configured private key (e.g., from a KMS, HSM, or file).
* Accept `IDSseCryptoProvider` dependency for the actual signature primitive (RSA/ECDSA/Ed25519).
* Keep space for future `KeylessDsseSigner` (Sigstore Fulcio/OIDC), but not required for v1.
**Config mapping**
* `payloadType` default: `"application/vnd.in-toto+json"`.
* Allow overrides per section: e.g., SBOM vs test logs.
---
### 5.3 Rekor client
**Interface**
```csharp
public interface IRekorClient
{
Task<(string Uuid, string InclusionHash, string? LogIndex)> UploadAsync(
ReadOnlySpan<byte> dsseEnvelope,
CancellationToken ct = default
);
}
```
**Responsibilities**
* Wrap HTTP client to Rekor:
* Build the proper Rekor entry for DSSE (log entry with DSSE envelope).
* Send HTTP POST to Rekor API.
* Parse UUID and inclusion information.
* Handle **duplicate entries**:
* If Rekor responds entry already exists”, return existing UUID instead of failing.
* Surface **clear exceptions**:
* `TransientHttpException` (for retryable 429/5xx).
* `PermanentHttpException` (4xx like 400/413).
**Implementation plan**
* Implement `RekorClient` using `HttpClientFactory`.
* Add config:
* `BaseUrl` (e.g., your Rekor instance).
* `TimeoutSeconds`.
* `MaxRequestBodyBytes` (for safety).
**Retry classification**
* Retry on:
* 429 (Too Many Requests).
* 5xx (server errors).
* Network timeouts / transient socket errors.
* No retry on:
* 4xx (except 408 if you want).
* 413 Payload Too Large (signal chunking issue).
---
### 5.4 Checkpoint store
Used to allow **resume** and **idempotency**.
**Interface**
```csharp
public sealed record CheckpointEntry(
string ChunkId,
string RekorUuid,
string InclusionHash,
string? LogIndex
);
public interface ICheckpointStore
{
bool TryGet(string chunkId, out CheckpointEntry entry);
void Put(CheckpointEntry entry);
void Flush(); // to persist to disk or remote store
}
```
**Implementation plan (v1)**
* Use a simple **filebased JSON** store per build:
* Path derived from build ID: e.g., `.attestations/checkpoints.json`.
* Internal representation: `Dictionary<string, CheckpointEntry>`.
* At end of run, `Flush()` writes out the file.
* On start of run, if file exists:
* Load existing checkpoints support resume.
**Future options**
* Plug in a distributed store (`ICheckpointStore` implementation backed by Redis, SQL, etc) for multistage pipelines.
---
### 5.5 Publisher / Orchestrator
Use a slightly enhanced version of what we sketched before.
**Interface**
```csharp
public sealed record AttestationPublisherOptions(
int TargetChunkBytes,
int HardChunkBytes,
string PayloadType,
int MaxAttempts,
TimeSpan InitialBackoff,
TimeSpan MaxBackoff
);
public sealed class AttestationPublisher
{
public AttestationPublisher(
IChunker chunker,
IDsseSigner signer,
IRekorClient rekor,
ICheckpointStore checkpointStore,
ILogger<AttestationPublisher> logger,
AttestationPublisherOptions options
) { ... }
public async IAsyncEnumerable<PublishResult> PublishAsync(
ArtifactEvidence evidence,
[System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken ct = default
);
}
```
**Algorithm**
For each `(ChunkRef, Payload)` from `IChunker.Split`:
1. Check `ICheckpointStore.TryGet(ChunkId)`:
* If found yield cached `PublishResult` (idempotency).
2. Build DSSE envelope via `_signer.Sign(payload, options.PayloadType)`.
3. Retry loop:
* Try `_rekor.UploadAsync(envelope, ct)`.
* On success:
* Create `CheckpointEntry`, store via `_checkpointStore.Put`.
* Yield `PublishResult`.
* On `TransientHttpException`:
* If attempts `MaxAttempts` surface as failure.
* Else exponential backoff with jitter and repeat.
* On `PermanentHttpException`:
* Log error and surface (no retry).
At the end of the run, call `_checkpointStore.Flush()`.
---
### 5.6 Manifest builder
**Responsibility**
Turn a set of `PublishResult` items into one manifest JSON.
**Interface**
```csharp
public interface IManifestBuilder
{
AttestationManifest Build(
ArtifactEvidence artifact,
IReadOnlyCollection<PublishResult> results,
string buildId,
DateTimeOffset publishedAtUtc
);
}
public interface IManifestWriter
{
Task WriteAsync(AttestationManifest manifest, string path, CancellationToken ct = default);
}
```
**Implementation plan**
* `JsonManifestBuilder` pure mapping from models to manifest DTO.
* `FileSystemManifestWriter` writes to a configurable path (e.g., `artifacts/attestations_manifest.json`).
---
## 6. Configuration & wiring
### 6.1 Options class
```csharp
public sealed class AttestationConfig
{
public string RekorBaseUrl { get; init; } = "";
public int RekorTimeoutSeconds { get; init; } = 30;
public int TargetChunkBytes { get; init; } = 800_000;
public int HardChunkBytes { get; init; } = 1_000_000;
public string DefaultPayloadType { get; init; } = "application/vnd.in-toto+json";
public int MaxAttempts { get; init; } = 5;
public int InitialBackoffMs { get; init; } = 200;
public int MaxBackoffMs { get; init; } = 5000;
public string CheckpointFilePath { get; init; } = ".attestations/checkpoints.json";
public string ManifestOutputPath { get; init; } = "attestations_manifest.json";
}
```
### 6.2 Example `appsettings.json` for CLI
```json
{
"Attestation": {
"RekorBaseUrl": "https://rekor.example.com",
"TargetChunkBytes": 800000,
"HardChunkBytes": 1000000,
"DefaultPayloadType": "application/vnd.in-toto+json",
"MaxAttempts": 5,
"InitialBackoffMs": 200,
"MaxBackoffMs": 5000,
"CheckpointFilePath": ".attestations/checkpoints.json",
"ManifestOutputPath": "attestations_manifest.json"
}
}
```
Wire via `IOptions<AttestationConfig>` in your DI container.
---
## 7. Observability & logging
### 7.1 Metrics (suggested)
Expose via your monitoring stack (Prometheus, App Insights, etc.):
* `attestations_chunks_total` labeled by `section`, `artifact_type`.
* `attestations_rekor_publish_success_total` labeled by `section`.
* `attestations_rekor_publish_failure_total` labeled by `section`, `failure_type` (4xx, 5xx, client_error).
* `attestations_rekor_latency_seconds` histogram.
* `attestations_chunk_size_bytes` histogram.
### 7.2 Logging
Log at **INFO**:
* Start/end of attestation publishing for each artifact.
* Number of chunks per section.
* Rekor UUID info (nonsensitive, ok to log).
Log at **DEBUG**:
* Exact Rekor request payload sizes.
* Retry attempts and backoff durations.
Log at **WARN/ERROR**:
* 4xx errors.
* Exhausted retries.
Include correlation IDs (build ID, artifact digest, chunk ID) in structured logs.
---
## 8. Testing strategy
### 8.1 Unit tests
* `ChunkerTests`
* Small payload 1 chunk.
* Large payload multiple chunks with no overlap and full coverage.
* Deterministic `ChunkId` generation (same input same IDs).
* `DsseSignerTests`
* Given a fixed key and payload DSSE envelope matches golden snapshot.
* `RekorClientTests`
* Mock `HttpMessageHandler`:
* 200 OK -> parse UUID, inclusion hash.
* 409 / “already exists” -> treat as success.
* 429 & 5xx -> throw `TransientHttpException`.
* 4xx -> throw `PermanentHttpException`.
* `CheckpointStoreTests`
* Put/TryGet behavior.
* Flush and reload from disk.
### 8.2 Integration tests
Against a **local or staging Rekor**:
* Publish single small attestation.
* Publish large SBOM that must be chunked.
* Simulate transient failure: first request 500, then 200; verify retry.
* Restart the test midflow, rerun; ensure already published chunks are skipped.
### 8.3 E2E in CI
* For a test project:
* Build → produce dummy SBOM/provenance.
* Run CLI to publish attestations.
* Archive:
* `attestations_manifest.json`.
* `checkpoints.json`.
* Optional: run a verification script that:
* Reads manifest.
* Queries Rekor for each UUID and validates inclusion.
---
## 9. CI integration (example)
Example GitHub Actions step (adapt as needed):
```yaml
- name: Publish attestations
run: |
dotnet SupplyChain.Attestations.Cli publish \
--artifact-id "${{ env.IMAGE_DIGEST }}" \
--artifact-type "container-image" \
--sbom "build/sbom.json" \
--provenance "build/provenance.json" \
--tests "build/test-results.json" \
--config "attestation.appsettings.json"
env:
ATTESTATION_SIGNING_KEY: ${{ secrets.ATTESTATION_SIGNING_KEY }}
```
The CLI command should:
1. Construct `ArtifactEvidence` from the input files.
2. Use DI to build `AttestationPublisher` and dependencies.
3. Stream results, build manifest, write outputs.
4. Exit nonzero if any chunk fails to publish.
---
## 10. Implementation roadmap (devoriented)
You can translate this into epics/stories; heres a logical order:
**Epic 1 Core models & chunking**
* Story 1: Define `ArtifactEvidence`, `EvidenceBlob`, `ChunkRef`, `PublishResult`.
* Story 2: Implement `IChunker` with simple bytebased splitter.
* Story 3: Deterministic `ChunkId` generation + tests.
**Epic 2 Signing & DSSE envelopes**
* Story 4: Implement `IDsseSigner` + `KeyBasedDsseSigner`.
* Story 5: DSSE envelope serialization tests (golden snapshots).
* Story 6: Wire in an abstract crypto provider so you can swap key sources later.
**Epic 3 Rekor client**
* Story 7: Implement `IRekorClient` using `HttpClient`.
* Story 8: Error classification & `TransientHttpException` / `PermanentHttpException`.
* Story 9: Integration tests with staging/local Rekor.
**Epic 4 Publisher, checkpoints, manifest**
* Story 10: Implement `ICheckpointStore` (filebased JSON).
* Story 11: Implement `AttestationPublisher` with retry/backoff.
* Story 12: Implement `IManifestBuilder` + `IManifestWriter`.
* Story 13: Create manifest schema and sample.
**Epic 5 CLI & CI integration**
* Story 14: Implement CLI `publish` command.
* Story 15: Wire config (appsettings + env overrides).
* Story 16: Add CI job template + docs for teams.
**Epic 6 Observability & hardening**
* Story 17: Add metrics & structured logging.
* Story 18: Load testing with large SBOMs/logs.
* Story 19: Final documentation: “How to add attestations to your pipeline”.
---
If youd like, I can next:
* Draft the exact C# interfaces and one full concrete implementation (e.g., `FileCheckpointStore`), or
* Write the CLI `publish` command skeleton that wires everything together.

View File

@@ -0,0 +1,514 @@
Heres a quick sizing rule of thumb for Sigstore attestations so you dont hit Rekor limits.
* **Base64 bloat:** DSSE wraps your JSON statement and then Base64encodes it. Base64 turns every 3 bytes into 4, so size ≈ `ceil(P/3)*4` (about **+3337%** on top of your raw JSON). ([Stack Overflow][1])
* **DSSE envelope fields:** Expect a small extra overhead for JSON keys like `payloadType`, `payload`, and `signatures` (and the signature itself). Sigstores bundle/DSSE examples show the structure used. ([Sigstore][2])
* **Public Rekor cap:** The **public Rekor instance rejects uploads over 100KB**. If your DSSE (after Base64 + JSON fields) exceeds that, shard/split the attestation or run your own Rekor. ([GitHub][3])
* **Reality check:** Teams routinely run into size errors when large statements are uploaded—the whole DSSE payload is sent to Rekor during verification/ingest. ([GitHub][4])
### Practical guidance
* Keep a **single attestation well under ~7080KB raw JSON** if it will be wrapped+Base64d (gives headroom for signatures/keys).
* Prefer **compact JSON** (no whitespace), **short key names**, and **avoid huge embedded fields** (e.g., trim SBOM evidence or link it by digest/URI).
* For big evidence sets, publish **multiple attestations** (logical shards) or **selfhost Rekor**. ([GitHub][3])
If you want, I can add a tiny calculator snippet that takes your payload bytes and estimates the final DSSE+Base64 size vs. the 100KB limit.
[1]: https://stackoverflow.com/questions/4715415/base64-what-is-the-worst-possible-increase-in-space-usage?utm_source=chatgpt.com "Base64: What is the worst possible increase in space usage?"
[2]: https://docs.sigstore.dev/about/bundle/?utm_source=chatgpt.com "Sigstore Bundle Format"
[3]: https://github.com/sigstore/rekor?utm_source=chatgpt.com "sigstore/rekor: Software Supply Chain Transparency Log"
[4]: https://github.com/sigstore/cosign/issues/3599?utm_source=chatgpt.com "Attestations require uploading entire payload to rekor #3599"
Heres a concrete, developerfriendly implementation plan you can hand to the team.
Ill assume:
* Youre using **Sigstore (Fulcio + Rekor + DSSE)**.
* Youre pushing to the **public Rekor instance**, which enforces a **100KB perentry size limit**.([GitHub][1])
* Attestations are JSON in a DSSE envelope and are produced in CI/CD.
You can copy this into a design doc and turn sections into tickets.
---
## 1. Goals & nongoals
**Goals**
1. Ensure **all Rekor uploads succeed** without hitting the 100KB limit.
2. Provide a **deterministic pipeline**: same inputs → same set of attestations.
3. Avoid losing security signal: large data (SBOMs, logs, etc.) should still be verifiable via references.
**Nongoals**
* Changing Rekor itself (well treat it as a black box).
* Redesigning your whole supply chain; were just changing how attestations are structured and uploaded.
---
## 2. Architecture changes (highlevel)
Add three core pieces:
1. **Attestation Builder** constructs one or more JSON statements per artifact.
2. **Size Guardrail & Sharder** checks size *before* upload; splits or externalizes data if needed.
3. **Rekor Client Wrapper** calls Rekor, handles size errors, and reports metrics.
Rough flow:
```text
CI job
→ gather metadata (subject digest, build info, SBOM, test results, etc.)
→ Attestation Builder (domain logic)
→ Size Guardrail & Sharder (JSON + DSSE + size checks)
→ Rekor Client Wrapper (upload + logging + metrics)
```
---
## 3. Config & constants (Ticket group A)
**A1 Add config**
* Add a configuration object / env variables:
```yaml
REKOR_MAX_ENTRY_BYTES: 100000 # current public limit, but treat as configurable
REKOR_SIZE_SAFETY_MARGIN: 0.9 # 90% of the limit as “soft” max
ATTESTATION_JSON_SOFT_MAX: 80000 # e.g. 80 KB JSON before DSSE/base64
```
* Make **`REKOR_MAX_ENTRY_BYTES`** overridable so:
* you can bump it for a private Rekor deployment.
* tests can simulate different limits.
**Definition of done**
* Config is available in whoever builds attestations (CI job, shared library, etc.).
* Unit tests read these values and assert behavior around boundary values.
---
## 4. Attestation schema guidelines (Ticket group B)
**B1 Define / revise schema**
For each statement type (e.g., SLSA, SBOM, test results):
* Mark **required vs optional** fields.
* Identify **large fields**:
* SBOM JSON
* long log lines
* full dependency lists
* coverage details
**Rule:**
> Large data should **not** be inlined; it should be stored externally and referenced by digest.
Add a standard “external evidence” shape:
```json
{
"externalEvidence": [
{
"type": "sbom-spdx-json",
"uri": "https://artifacts.example.com/sbom/<build-id>.json",
"digest": "sha256:abcd...",
"sizeBytes": 123456
}
]
}
```
**B2 Budget fields**
* For each statement type, estimate typical sizes:
* Fixed overhead (keys, small fields).
* Variable data (e.g., components length).
* Document a **rule of thumb**:
“Total JSON payload for type X should be ≤ 80KB; otherwise we split or externalize.”
**Definition of done**
* Schema docs updated with “size budget” notes.
* New `externalEvidence` (or equivalent) field defined and versioned.
---
## 5. Size Guardrail & Estimator (Ticket group C)
This is the core safety net.
### C1 Implement JSON size estimator
Languageagnostic idea:
```pseudo
function jsonBytes(payloadObject): int {
jsonString = JSON.stringify(payloadObject, no_whitespace)
return length(utf8_encode(jsonString))
}
```
* Always **minify** (no pretty printing) for the final payload.
* Use UTF8 byte length, not character count.
### C2 DSSE + base64 size estimator
Instead of guessing, **actually build the envelope** before upload:
```pseudo
function buildDsseEnvelope(statementJson: string, signature: bytes, keyId: string): string {
envelope = {
"payloadType": "application/vnd.in-toto+json",
"payload": base64_encode(statementJson),
"signatures": [
{
"sig": base64_encode(signature),
"keyid": keyId
}
]
}
return JSON.stringify(envelope, no_whitespace)
}
function envelopeBytes(envelopeJson: string): int {
return length(utf8_encode(envelopeJson))
}
```
**Rule:** if `envelopeBytes(envelopeJson) > REKOR_MAX_ENTRY_BYTES * REKOR_SIZE_SAFETY_MARGIN`, we consider this envelope **too big** and trigger sharding / externalization logic before calling Rekor.
> Note: This means you temporarily sign once to measure size. Thats acceptable; signing is cheap compared to a failing Rekor upload.
### C3 Guardrail function
```pseudo
function ensureWithinRekorLimit(envelopeJson: string) {
bytes = envelopeBytes(envelopeJson)
if bytes > REKOR_MAX_ENTRY_BYTES {
throw new OversizeAttestationError(bytes, REKOR_MAX_ENTRY_BYTES)
}
}
```
**Definition of done**
* Utility functions for `jsonBytes`, `buildDsseEnvelope`, `envelopeBytes`, and `ensureWithinRekorLimit`.
* Unit tests:
* Below limit → pass.
* Exactly at limit → pass.
* Above limit → throws `OversizeAttestationError`.
---
## 6. Sharding / externalization strategy (Ticket group D)
This is where you decide *what to do* when a statement is too big.
### D1 Strategy decision
Implement in this order:
1. **Externalize big blobs** (preferred).
2. If still too big, **shard** into multiple attestations.
#### 1) Externalization rules
Examples:
* SBOM:
* Write full SBOM to artifact store or object storage (S3, GCS, internal).
* In attestation, keep only:
* URI
* hash
* size
* format
* Test logs:
* Keep only summary + URI to full logs.
Implement a helper:
```pseudo
function externalizeIfLarge(fieldName, dataBytes, thresholdBytes): RefOrInline {
if length(dataBytes) <= thresholdBytes {
return { "inline": true, "value": dataBytes }
} else {
uri = uploadToArtifactStore(dataBytes)
digest = sha256(dataBytes)
return {
"inline": false,
"uri": uri,
"digest": "sha256:" + digest
}
}
}
```
#### 2) Sharding rules
Example for SBOMlike data: if you have a big `components` list:
```pseudo
MAX_COMPONENTS_PER_ATTESTATION = 1000 # tune this via tests
function shardComponents(components[]):
chunks = chunk(components, MAX_COMPONENTS_PER_ATTESTATION)
attestations = []
for each chunk in chunks:
att = baseStatement()
att["components"] = chunk
attestations.append(att)
return attestations
```
After sharding:
* Each chunk becomes its **own statement** (and its own DSSE envelope + Rekor entry).
* Each statement should include:
* The same **subject (artifact digest)**.
* A `shardId` and `shardCount`, or a `groupId` (e.g., build ID) to relate them.
Example:
```json
{
"_sharding": {
"groupId": "build-1234-sbom",
"shardIndex": 0,
"shardCount": 3
}
}
```
**D2 Integration with size guardrail**
Flow:
1. Build full statement.
2. If `jsonBytes(statement) <= ATTESTATION_JSON_SOFT_MAX`: use asis.
3. Else:
* Try externalizing big fields.
* Remeasure JSON size.
4. If still above `ATTESTATION_JSON_SOFT_MAX`:
* Apply sharding (e.g., split `components` list).
5. For each shard:
* Build DSSE envelope.
* Run `ensureWithinRekorLimit`.
If after sharding a single shard **still** exceeds Rekors limit, you must:
* Fail the pipeline with a **clear error**.
* Log enough diagnostics to adjust your thresholds or schemas.
**Definition of done**
* Implementation for:
* `externalizeIfLarge`,
* `shardComponents` (or equivalent for your large arrays),
* `_sharding` metadata.
* Tests:
* Large SBOM → multiple attestations, each under size limit.
* Externalization correctly moves large fields out and keeps digests.
---
## 7. Rekor client wrapper (Ticket group E)
### E1 Wrap Rekor interactions
Create a small abstraction:
```pseudo
class RekorClient {
function uploadDsseEnvelope(envelopeJson: string): LogEntryRef {
ensureWithinRekorLimit(envelopeJson)
response = http.post(REKOR_URL + "/api/v1/log/entries", body=envelopeJson)
if response.statusCode == 201 or response.statusCode == 200:
return parseLogEntryRef(response.body)
else if response.statusCode == 413 or isSizeError(response.body):
throw new RekorSizeLimitError(response.statusCode, response.body)
else:
throw new RekorUploadError(response.statusCode, response.body)
}
}
```
* The `ensureWithinRekorLimit` call should prevent most 413s.
* `isSizeError` should inspect message strings that mention “size”, “100KB”, etc., just in case Rekors error handling changes.
### E2 Error handling strategy
On `RekorSizeLimitError`:
* Mark the build as **failed** (or at least **noncompliant**).
* Emit a structured log event:
```json
{
"event": "rekor_upload_oversize",
"envelopeBytes": 123456,
"rekorMaxBytes": 100000,
"buildId": "build-1234"
}
```
* (Optional) Attach the JSON size breakdown for debugging.
**Definition of done**
* Wrapper around existing Rekor client (or direct HTTP).
* Tests for:
* Successful upload.
* Simulated 413 / size error → recognized and surfaced cleanly.
---
## 8. CI/CD integration (Ticket group F)
### F1 Where to run this
Integrate in your pipeline step that currently does signing, e.g.:
```text
build → test → sign → attest → rekor-upload → deploy
```
Change to:
```text
build → test → sign → build-attestations (w/ size control)
→ upload-all-attestations-to-rekor
→ deploy
```
### F2 Multientry handling
If sharding is used:
* The pipeline should treat **“all relevant attestations uploaded successfully”** as a success condition.
* Store a manifest per build:
```json
{
"buildId": "build-1234",
"subjectDigest": "sha256:abcd...",
"attestationEntries": [
{
"type": "slsa",
"rekorLogIndex": 123456,
"shardIndex": 0,
"shardCount": 1
},
{
"type": "sbom",
"rekorLogIndex": 123457,
"shardIndex": 0,
"shardCount": 3
}
]
}
```
This manifest can be stored in your artifact store and used later by verifiers.
**Definition of done**
* CI job updated.
* Build manifest persisted.
* Documentation updated so ops/security know where to find attestation references.
---
## 9. Verification path updates (Ticket group G)
If you shard or externalize, your **verifiers** need to understand that.
### G1 Verify external evidence
* When verifying, for each `externalEvidence` entry:
* Fetch the blob from its URI.
* Compute its digest.
* Compare with the digest in the attestation.
* Decide whether verifiers:
* Must fetch all external evidence (strict), or
* Are allowed to do “metadataonly” verification if evidence URLs look trustworthy.
### G2 Verify sharded attestations
* Given a build ID or subject digest:
* Look up all Rekor entries for that subject (or use your manifest).
* Group by `_sharding.groupId`.
* Ensure all shards are present (`shardCount`).
* Verify each shards signature and subject digest.
**Definition of done**
* Verifier code updated to:
* Handle `externalEvidence`.
* Handle `_sharding` metadata.
* Integration test:
* Endtoend: build → shard → upload → verify all shards and external evidence.
---
## 10. Observability & guardrails (Ticket group H)
**H1 Metrics**
Add these metrics:
* `attestation_json_size_bytes` (per type).
* `rekor_envelope_size_bytes` (per type).
* Counters:
* `attestation_sharded_total`
* `attestation_externalized_total`
* `rekor_upload_oversize_total`
**H2 Alerts**
* If `rekor_upload_oversize_total` > 0 over some window → alert.
* If average `rekor_envelope_size_bytes` > 7080% of limit for long → investigate schema growth.
---
## 11. Suggested ticket breakdown
You can cut this into roughly these tickets:
1. **Config & constants for Rekor size limits** (A).
2. **Schema update: support externalEvidence + sharding metadata** (B).
3. **Implement JSON & DSSE size estimation utilities** (C1C3).
4. **Implement externalization of SBOMs/logs and sizeaware builder** (D1).
5. **Implement sharding for large arrays (e.g., components)** (D1D2).
6. **Wrap Rekor client with size checks and error handling** (E).
7. **CI pipeline integration + build manifest** (F).
8. **Verifier changes for sharding + external evidence** (G).
9. **Metrics & alerts for attestation/Rekor sizes** (H).
---
If you tell me what language / stack youre using (Go, Java, Python, Node, etc.), I can turn this into more concrete code snippets and even example modules.
[1]: https://github.com/sigstore/rekor?utm_source=chatgpt.com "sigstore/rekor: Software Supply Chain Transparency Log"

View File

@@ -45,6 +45,7 @@ internal static class CommandFactory
root.Add(BuildKmsCommand(services, verboseOption, cancellationToken));
root.Add(BuildVulnCommand(services, verboseOption, cancellationToken));
root.Add(BuildCryptoCommand(services, verboseOption, cancellationToken));
root.Add(BuildRiskProfileCommand(verboseOption, cancellationToken));
var pluginLogger = loggerFactory.CreateLogger<CliCommandModuleLoader>();
var pluginLoader = new CliCommandModuleLoader(services, options, pluginLogger);
@@ -1607,4 +1608,64 @@ internal static class CommandFactory
_ => $"{value[..2]}***{value[^2..]}"
};
}
private static Command BuildRiskProfileCommand(Option<bool> verboseOption, CancellationToken cancellationToken)
{
_ = cancellationToken;
var riskProfile = new Command("risk-profile", "Manage risk profile schemas and validation.");
var validate = new Command("validate", "Validate a risk profile JSON file against the schema.");
var inputOption = new Option<string>("--input", new[] { "-i" })
{
Description = "Path to the risk profile JSON file to validate.",
Required = true
};
var formatOption = new Option<string?>("--format")
{
Description = "Output format: table (default) or json."
};
var outputOption = new Option<string?>("--output")
{
Description = "Write validation report to the specified file path."
};
var strictOption = new Option<bool>("--strict")
{
Description = "Treat warnings as errors (exit code 1 on any issue)."
};
validate.Add(inputOption);
validate.Add(formatOption);
validate.Add(outputOption);
validate.Add(strictOption);
validate.SetAction((parseResult, _) =>
{
var input = parseResult.GetValue(inputOption) ?? string.Empty;
var format = parseResult.GetValue(formatOption) ?? "table";
var output = parseResult.GetValue(outputOption);
var strict = parseResult.GetValue(strictOption);
var verbose = parseResult.GetValue(verboseOption);
return CommandHandlers.HandleRiskProfileValidateAsync(input, format, output, strict, verbose);
});
var schema = new Command("schema", "Display or export the risk profile JSON schema.");
var schemaOutputOption = new Option<string?>("--output")
{
Description = "Write the schema to the specified file path."
};
schema.Add(schemaOutputOption);
schema.SetAction((parseResult, _) =>
{
var output = parseResult.GetValue(schemaOutputOption);
var verbose = parseResult.GetValue(verboseOption);
return CommandHandlers.HandleRiskProfileSchemaAsync(output, verbose);
});
riskProfile.Add(validate);
riskProfile.Add(schema);
return riskProfile;
}
}

View File

@@ -7810,4 +7810,198 @@ internal static class CommandHandlers
}
private sealed record ProviderInfo(string Name, string Type, IReadOnlyList<CryptoProviderKeyDescriptor> Keys);
#region Risk Profile Commands
public static async Task HandleRiskProfileValidateAsync(
string inputPath,
string format,
string? outputPath,
bool strict,
bool verbose)
{
_ = verbose;
using var activity = CliActivitySource.Instance.StartActivity("cli.riskprofile.validate", ActivityKind.Client);
using var duration = CliMetrics.MeasureCommandDuration("risk-profile validate");
try
{
if (!File.Exists(inputPath))
{
AnsiConsole.MarkupLine("[red]Error:[/] Input file not found: {0}", Markup.Escape(inputPath));
Environment.ExitCode = 1;
return;
}
var profileJson = await File.ReadAllTextAsync(inputPath).ConfigureAwait(false);
var schema = StellaOps.Policy.RiskProfile.Schema.RiskProfileSchemaProvider.GetSchema();
var schemaVersion = StellaOps.Policy.RiskProfile.Schema.RiskProfileSchemaProvider.GetSchemaVersion();
JsonNode? profileNode;
try
{
profileNode = JsonNode.Parse(profileJson);
if (profileNode is null)
{
throw new InvalidOperationException("Parsed JSON is null.");
}
}
catch (JsonException ex)
{
AnsiConsole.MarkupLine("[red]Error:[/] Invalid JSON: {0}", Markup.Escape(ex.Message));
Environment.ExitCode = 1;
return;
}
var result = schema.Evaluate(profileNode);
var issues = new List<RiskProfileValidationIssue>();
if (!result.IsValid)
{
CollectValidationIssues(result, issues);
}
var report = new RiskProfileValidationReport(
FilePath: inputPath,
IsValid: result.IsValid,
SchemaVersion: schemaVersion,
Issues: issues);
if (format.Equals("json", StringComparison.OrdinalIgnoreCase))
{
var reportJson = JsonSerializer.Serialize(report, new JsonSerializerOptions
{
WriteIndented = true,
PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower
});
if (!string.IsNullOrEmpty(outputPath))
{
await File.WriteAllTextAsync(outputPath, reportJson).ConfigureAwait(false);
AnsiConsole.MarkupLine("Validation report written to [cyan]{0}[/]", Markup.Escape(outputPath));
}
else
{
Console.WriteLine(reportJson);
}
}
else
{
if (result.IsValid)
{
AnsiConsole.MarkupLine("[green]✓[/] Profile is valid (schema v{0})", schemaVersion);
}
else
{
AnsiConsole.MarkupLine("[red]✗[/] Profile is invalid (schema v{0})", schemaVersion);
AnsiConsole.WriteLine();
var table = new Table();
table.AddColumn("Path");
table.AddColumn("Error");
table.AddColumn("Message");
foreach (var issue in issues)
{
table.AddRow(
Markup.Escape(issue.Path),
Markup.Escape(issue.Error),
Markup.Escape(issue.Message));
}
AnsiConsole.Write(table);
}
if (!string.IsNullOrEmpty(outputPath))
{
var reportJson = JsonSerializer.Serialize(report, new JsonSerializerOptions
{
WriteIndented = true,
PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower
});
await File.WriteAllTextAsync(outputPath, reportJson).ConfigureAwait(false);
AnsiConsole.MarkupLine("Validation report written to [cyan]{0}[/]", Markup.Escape(outputPath));
}
}
Environment.ExitCode = result.IsValid ? 0 : (strict ? 1 : 0);
if (!result.IsValid && !strict)
{
Environment.ExitCode = 1;
}
}
catch (Exception ex)
{
AnsiConsole.MarkupLine("[red]Error:[/] {0}", Markup.Escape(ex.Message));
Environment.ExitCode = 1;
}
await Task.CompletedTask.ConfigureAwait(false);
}
public static async Task HandleRiskProfileSchemaAsync(string? outputPath, bool verbose)
{
_ = verbose;
using var activity = CliActivitySource.Instance.StartActivity("cli.riskprofile.schema", ActivityKind.Client);
using var duration = CliMetrics.MeasureCommandDuration("risk-profile schema");
try
{
var schemaText = StellaOps.Policy.RiskProfile.Schema.RiskProfileSchemaProvider.GetSchemaText();
var schemaVersion = StellaOps.Policy.RiskProfile.Schema.RiskProfileSchemaProvider.GetSchemaVersion();
if (!string.IsNullOrEmpty(outputPath))
{
await File.WriteAllTextAsync(outputPath, schemaText).ConfigureAwait(false);
AnsiConsole.MarkupLine("Risk profile schema v{0} written to [cyan]{1}[/]", schemaVersion, Markup.Escape(outputPath));
}
else
{
Console.WriteLine(schemaText);
}
Environment.ExitCode = 0;
}
catch (Exception ex)
{
AnsiConsole.MarkupLine("[red]Error:[/] {0}", Markup.Escape(ex.Message));
Environment.ExitCode = 1;
}
}
private static void CollectValidationIssues(
Json.Schema.EvaluationResults results,
List<RiskProfileValidationIssue> issues,
string path = "")
{
if (results.Errors is not null)
{
foreach (var (key, message) in results.Errors)
{
var instancePath = results.InstanceLocation?.ToString() ?? path;
issues.Add(new RiskProfileValidationIssue(instancePath, key, message));
}
}
if (results.Details is not null)
{
foreach (var detail in results.Details)
{
if (!detail.IsValid)
{
CollectValidationIssues(detail, issues, detail.InstanceLocation?.ToString() ?? path);
}
}
}
}
private sealed record RiskProfileValidationReport(
string FilePath,
bool IsValid,
string SchemaVersion,
IReadOnlyList<RiskProfileValidationIssue> Issues);
private sealed record RiskProfileValidationIssue(string Path, string Error, string Message);
#endregion
}

View File

@@ -54,6 +54,7 @@
<ProjectReference Include="../../Scanner/__Libraries/StellaOps.Scanner.Analyzers.Lang.Java/StellaOps.Scanner.Analyzers.Lang.Java.csproj" />
<ProjectReference Include="../../Scanner/__Libraries/StellaOps.Scanner.Surface.Env/StellaOps.Scanner.Surface.Env.csproj" />
<ProjectReference Include="../../Scanner/__Libraries/StellaOps.Scanner.Surface.Validation/StellaOps.Scanner.Surface.Validation.csproj" />
<ProjectReference Include="../../Policy/StellaOps.Policy.RiskProfile/StellaOps.Policy.RiskProfile.csproj" />
</ItemGroup>
<ItemGroup Condition="'$(StellaOpsEnableCryptoPro)' == 'true'">

View File

@@ -1,5 +1,5 @@
using System.Security.Cryptography;
using System.Text;
using StellaOps.Cryptography;
namespace StellaOps.EvidenceLocker.Core.Builders;
@@ -10,6 +10,35 @@ public interface IMerkleTreeCalculator
public sealed class MerkleTreeCalculator : IMerkleTreeCalculator
{
private readonly ICryptoHasher _hasher;
/// <summary>
/// Creates a MerkleTreeCalculator using the specified hasher.
/// </summary>
/// <param name="hasher">Crypto hasher resolved from the provider registry.</param>
public MerkleTreeCalculator(ICryptoHasher hasher)
{
_hasher = hasher ?? throw new ArgumentNullException(nameof(hasher));
}
/// <summary>
/// Creates a MerkleTreeCalculator using the crypto registry to resolve the hasher.
/// </summary>
/// <param name="cryptoRegistry">Crypto provider registry.</param>
/// <param name="algorithmId">Hash algorithm to use (defaults to SHA256).</param>
/// <param name="preferredProvider">Optional preferred crypto provider.</param>
public MerkleTreeCalculator(
ICryptoProviderRegistry cryptoRegistry,
string? algorithmId = null,
string? preferredProvider = null)
{
ArgumentNullException.ThrowIfNull(cryptoRegistry);
var algorithm = algorithmId ?? HashAlgorithms.Sha256;
var resolution = cryptoRegistry.ResolveHasher(algorithm, preferredProvider);
_hasher = resolution.Hasher;
}
public string CalculateRootHash(IEnumerable<string> canonicalLeafValues)
{
var leaves = canonicalLeafValues
@@ -24,7 +53,7 @@ public sealed class MerkleTreeCalculator : IMerkleTreeCalculator
return BuildTree(leaves);
}
private static string BuildTree(IReadOnlyList<string> currentLevel)
private string BuildTree(IReadOnlyList<string> currentLevel)
{
if (currentLevel.Count == 1)
{
@@ -45,10 +74,9 @@ public sealed class MerkleTreeCalculator : IMerkleTreeCalculator
return BuildTree(nextLevel);
}
private static string HashString(string value)
private string HashString(string value)
{
var bytes = Encoding.UTF8.GetBytes(value);
var hash = SHA256.HashData(bytes);
return Convert.ToHexString(hash).ToLowerInvariant();
return _hasher.ComputeHashHex(bytes);
}
}

View File

@@ -24,6 +24,11 @@ public sealed class EvidenceLockerOptions
public PortableOptions Portable { get; init; } = new();
public IncidentModeOptions Incident { get; init; } = new();
/// <summary>
/// Cryptographic options for hash algorithm selection and provider routing.
/// </summary>
public EvidenceCryptoOptions Crypto { get; init; } = new();
}
public sealed class DatabaseOptions
@@ -208,3 +213,20 @@ public sealed class PortableOptions
[MinLength(1)]
public string MetadataFileName { get; init; } = "bundle.json";
}
/// <summary>
/// Cryptographic options for evidence bundle hashing and provider routing.
/// </summary>
public sealed class EvidenceCryptoOptions
{
/// <summary>
/// Hash algorithm used for Merkle tree computation. Defaults to SHA256.
/// Supported: SHA256, SHA384, SHA512, GOST3411-2012-256, GOST3411-2012-512.
/// </summary>
public string HashAlgorithm { get; init; } = HashAlgorithms.Sha256;
/// <summary>
/// Preferred crypto provider name. When null, the registry uses its default resolution order.
/// </summary>
public string? PreferredProvider { get; init; }
}

View File

@@ -8,6 +8,7 @@ using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection.Extensions;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using StellaOps.Cryptography;
using StellaOps.Cryptography.DependencyInjection;
using StellaOps.Cryptography.Plugin.BouncyCastle;
using StellaOps.EvidenceLocker.Core.Builders;
@@ -61,7 +62,15 @@ public static class EvidenceLockerInfrastructureServiceCollectionExtensions
services.AddSingleton<IEvidenceLockerMigrationRunner, EvidenceLockerMigrationRunner>();
services.AddHostedService<EvidenceLockerMigrationHostedService>();
services.AddSingleton<IMerkleTreeCalculator, MerkleTreeCalculator>();
services.AddSingleton<IMerkleTreeCalculator>(provider =>
{
var options = provider.GetRequiredService<IOptions<EvidenceLockerOptions>>().Value;
var cryptoRegistry = provider.GetRequiredService<ICryptoProviderRegistry>();
return new MerkleTreeCalculator(
cryptoRegistry,
options.Crypto.HashAlgorithm,
options.Crypto.PreferredProvider);
});
services.AddScoped<IEvidenceBundleBuilder, EvidenceBundleBuilder>();
services.AddScoped<IEvidenceBundleRepository, EvidenceBundleRepository>();

View File

@@ -0,0 +1,251 @@
using System.Collections.Immutable;
using System.Net;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Channels;
using Xunit;
namespace StellaOps.Notifier.Tests.Channels;
public sealed class WebhookChannelAdapterTests
{
[Fact]
public async Task DispatchAsync_SuccessfulDelivery_ReturnsSuccess()
{
// Arrange
var handler = new MockHttpMessageHandler(HttpStatusCode.OK, "ok");
var httpClient = new HttpClient(handler);
var auditRepo = new InMemoryAuditRepository();
var options = Options.Create(new ChannelAdapterOptions());
var adapter = new WebhookChannelAdapter(
httpClient,
auditRepo,
options,
TimeProvider.System,
NullLogger<WebhookChannelAdapter>.Instance);
var channel = CreateChannel("https://example.com/webhook");
var context = CreateContext(channel);
// Act
var result = await adapter.DispatchAsync(context, CancellationToken.None);
// Assert
Assert.True(result.Success);
Assert.Equal(ChannelDispatchStatus.Sent, result.Status);
Assert.Single(handler.Requests);
}
[Fact]
public async Task DispatchAsync_InvalidEndpoint_ReturnsInvalidConfiguration()
{
// Arrange
var handler = new MockHttpMessageHandler(HttpStatusCode.OK, "ok");
var httpClient = new HttpClient(handler);
var auditRepo = new InMemoryAuditRepository();
var options = Options.Create(new ChannelAdapterOptions());
var adapter = new WebhookChannelAdapter(
httpClient,
auditRepo,
options,
TimeProvider.System,
NullLogger<WebhookChannelAdapter>.Instance);
var channel = CreateChannel(null);
var context = CreateContext(channel);
// Act
var result = await adapter.DispatchAsync(context, CancellationToken.None);
// Assert
Assert.False(result.Success);
Assert.Equal(ChannelDispatchStatus.InvalidConfiguration, result.Status);
Assert.Empty(handler.Requests);
}
[Fact]
public async Task DispatchAsync_RateLimited_ReturnsThrottled()
{
// Arrange
var handler = new MockHttpMessageHandler(HttpStatusCode.TooManyRequests, "rate limited");
var httpClient = new HttpClient(handler);
var auditRepo = new InMemoryAuditRepository();
var options = Options.Create(new ChannelAdapterOptions { MaxRetries = 0 });
var adapter = new WebhookChannelAdapter(
httpClient,
auditRepo,
options,
TimeProvider.System,
NullLogger<WebhookChannelAdapter>.Instance);
var channel = CreateChannel("https://example.com/webhook");
var context = CreateContext(channel);
// Act
var result = await adapter.DispatchAsync(context, CancellationToken.None);
// Assert
Assert.False(result.Success);
Assert.Equal(ChannelDispatchStatus.Throttled, result.Status);
Assert.Equal(429, result.HttpStatusCode);
}
[Fact]
public async Task DispatchAsync_ServerError_RetriesAndFails()
{
// Arrange
var handler = new MockHttpMessageHandler(HttpStatusCode.ServiceUnavailable, "unavailable");
var httpClient = new HttpClient(handler);
var auditRepo = new InMemoryAuditRepository();
var options = Options.Create(new ChannelAdapterOptions
{
MaxRetries = 2,
RetryBaseDelay = TimeSpan.FromMilliseconds(10),
RetryMaxDelay = TimeSpan.FromMilliseconds(50)
});
var adapter = new WebhookChannelAdapter(
httpClient,
auditRepo,
options,
TimeProvider.System,
NullLogger<WebhookChannelAdapter>.Instance);
var channel = CreateChannel("https://example.com/webhook");
var context = CreateContext(channel);
// Act
var result = await adapter.DispatchAsync(context, CancellationToken.None);
// Assert
Assert.False(result.Success);
Assert.Equal(3, handler.Requests.Count); // Initial + 2 retries
}
[Fact]
public async Task CheckHealthAsync_ValidEndpoint_ReturnsHealthy()
{
// Arrange
var handler = new MockHttpMessageHandler(HttpStatusCode.OK, "ok");
var httpClient = new HttpClient(handler);
var auditRepo = new InMemoryAuditRepository();
var options = Options.Create(new ChannelAdapterOptions());
var adapter = new WebhookChannelAdapter(
httpClient,
auditRepo,
options,
TimeProvider.System,
NullLogger<WebhookChannelAdapter>.Instance);
var channel = CreateChannel("https://example.com/webhook");
// Act
var result = await adapter.CheckHealthAsync(channel, CancellationToken.None);
// Assert
Assert.True(result.Healthy);
Assert.Equal("healthy", result.Status);
}
[Fact]
public async Task CheckHealthAsync_DisabledChannel_ReturnsDegraded()
{
// Arrange
var handler = new MockHttpMessageHandler(HttpStatusCode.OK, "ok");
var httpClient = new HttpClient(handler);
var auditRepo = new InMemoryAuditRepository();
var options = Options.Create(new ChannelAdapterOptions());
var adapter = new WebhookChannelAdapter(
httpClient,
auditRepo,
options,
TimeProvider.System,
NullLogger<WebhookChannelAdapter>.Instance);
var channel = CreateChannel("https://example.com/webhook", enabled: false);
// Act
var result = await adapter.CheckHealthAsync(channel, CancellationToken.None);
// Assert
Assert.True(result.Healthy);
Assert.Equal("degraded", result.Status);
}
private static NotifyChannel CreateChannel(string? endpoint, bool enabled = true)
{
return NotifyChannel.Create(
channelId: "test-channel",
tenantId: "test-tenant",
name: "Test Webhook",
type: NotifyChannelType.Webhook,
config: NotifyChannelConfig.Create(
secretRef: "secret://test",
endpoint: endpoint),
enabled: enabled);
}
private static ChannelDispatchContext CreateContext(NotifyChannel channel)
{
var delivery = NotifyDelivery.Create(
deliveryId: "delivery-001",
tenantId: channel.TenantId,
ruleId: "rule-001",
actionId: "action-001",
eventId: "event-001",
kind: "test",
status: NotifyDeliveryStatus.Pending);
return new ChannelDispatchContext(
DeliveryId: delivery.DeliveryId,
TenantId: channel.TenantId,
Channel: channel,
Delivery: delivery,
RenderedBody: """{"message": "test notification"}""",
Subject: "Test Subject",
Metadata: new Dictionary<string, string>(),
Timestamp: DateTimeOffset.UtcNow,
TraceId: "trace-001");
}
private sealed class MockHttpMessageHandler : HttpMessageHandler
{
private readonly HttpStatusCode _statusCode;
private readonly string _content;
public List<HttpRequestMessage> Requests { get; } = [];
public MockHttpMessageHandler(HttpStatusCode statusCode, string content)
{
_statusCode = statusCode;
_content = content;
}
protected override Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request,
CancellationToken cancellationToken)
{
Requests.Add(request);
var response = new HttpResponseMessage(_statusCode)
{
Content = new StringContent(_content)
};
return Task.FromResult(response);
}
}
private sealed class InMemoryAuditRepository : StellaOps.Notify.Storage.Mongo.Repositories.INotifyAuditRepository
{
public List<(string TenantId, string EventType, string Actor, IReadOnlyDictionary<string, string> Metadata)> Entries { get; } = [];
public Task AppendAsync(
string tenantId,
string eventType,
string actor,
IReadOnlyDictionary<string, string> metadata,
CancellationToken cancellationToken)
{
Entries.Add((tenantId, eventType, actor, metadata));
return Task.CompletedTask;
}
}
}

View File

@@ -0,0 +1,445 @@
using System.Text.Json.Nodes;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Moq;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class CorrelationEngineTests
{
private readonly Mock<ICorrelationKeyBuilderFactory> _keyBuilderFactory;
private readonly Mock<ICorrelationKeyBuilder> _keyBuilder;
private readonly Mock<IIncidentManager> _incidentManager;
private readonly Mock<INotifyThrottler> _throttler;
private readonly Mock<IQuietHoursEvaluator> _quietHoursEvaluator;
private readonly CorrelationEngineOptions _options;
private readonly CorrelationEngine _engine;
public CorrelationEngineTests()
{
_keyBuilderFactory = new Mock<ICorrelationKeyBuilderFactory>();
_keyBuilder = new Mock<ICorrelationKeyBuilder>();
_incidentManager = new Mock<IIncidentManager>();
_throttler = new Mock<INotifyThrottler>();
_quietHoursEvaluator = new Mock<IQuietHoursEvaluator>();
_options = new CorrelationEngineOptions();
_keyBuilderFactory
.Setup(f => f.GetBuilder(It.IsAny<string>()))
.Returns(_keyBuilder.Object);
_keyBuilder
.Setup(b => b.BuildKey(It.IsAny<NotifyEvent>(), It.IsAny<CorrelationKeyExpression>()))
.Returns("test-correlation-key");
_keyBuilder.SetupGet(b => b.Name).Returns("composite");
_engine = new CorrelationEngine(
_keyBuilderFactory.Object,
_incidentManager.Object,
_throttler.Object,
_quietHoursEvaluator.Object,
Options.Create(_options),
NullLogger<CorrelationEngine>.Instance);
}
[Fact]
public async Task CorrelateAsync_NewIncident_ReturnsNewIncidentResult()
{
// Arrange
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 0);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 1 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.True(result.Correlated);
Assert.True(result.IsNewIncident);
Assert.True(result.ShouldNotify);
Assert.Equal("inc-test123", result.IncidentId);
Assert.Equal("test-correlation-key", result.CorrelationKey);
}
[Fact]
public async Task CorrelateAsync_ExistingIncident_FirstOnlyPolicy_DoesNotNotify()
{
// Arrange
_options.NotificationPolicy = NotificationPolicy.FirstOnly;
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 5);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 6 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.False(result.IsNewIncident);
Assert.False(result.ShouldNotify);
}
[Fact]
public async Task CorrelateAsync_ExistingIncident_EveryEventPolicy_Notifies()
{
// Arrange
_options.NotificationPolicy = NotificationPolicy.EveryEvent;
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 5);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 6 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.False(result.IsNewIncident);
Assert.True(result.ShouldNotify);
}
[Fact]
public async Task CorrelateAsync_Suppressed_DoesNotNotify()
{
// Arrange
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 0);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 1 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.Suppressed("Quiet hours", "quiet_hours"));
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.False(result.ShouldNotify);
Assert.Equal("Quiet hours", result.SuppressionReason);
}
[Fact]
public async Task CorrelateAsync_Throttled_DoesNotNotify()
{
// Arrange
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 0);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 1 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.Throttled(15));
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.False(result.ShouldNotify);
Assert.Contains("Throttled", result.SuppressionReason);
}
[Fact]
public async Task CorrelateAsync_UsesEventKindSpecificKeyExpression()
{
// Arrange
var customExpression = new CorrelationKeyExpression
{
Type = "template",
Template = "{{tenant}}-{{kind}}"
};
_options.KeyExpressions["security.alert"] = customExpression;
var notifyEvent = CreateTestEvent("security.alert");
var incident = CreateTestIncident(eventCount: 0);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 1 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
await _engine.CorrelateAsync(notifyEvent);
// Assert
_keyBuilderFactory.Verify(f => f.GetBuilder("template"), Times.Once);
}
[Fact]
public async Task CorrelateAsync_UsesWildcardKeyExpression()
{
// Arrange
var customExpression = new CorrelationKeyExpression
{
Type = "custom",
Fields = ["source"]
};
_options.KeyExpressions["security.*"] = customExpression;
var notifyEvent = CreateTestEvent("security.vulnerability");
var incident = CreateTestIncident(eventCount: 0);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 1 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
await _engine.CorrelateAsync(notifyEvent);
// Assert
_keyBuilderFactory.Verify(f => f.GetBuilder("custom"), Times.Once);
}
[Fact]
public async Task CorrelateAsync_OnEscalationPolicy_NotifiesAtThreshold()
{
// Arrange
_options.NotificationPolicy = NotificationPolicy.OnEscalation;
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 4); // Will become 5 after record
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 5 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.True(result.ShouldNotify);
}
[Fact]
public async Task CorrelateAsync_OnEscalationPolicy_NotifiesOnCriticalSeverity()
{
// Arrange
_options.NotificationPolicy = NotificationPolicy.OnEscalation;
var payload = new JsonObject { ["severity"] = "CRITICAL" };
var notifyEvent = CreateTestEvent(payload: payload);
var incident = CreateTestIncident(eventCount: 2);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 3 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.True(result.ShouldNotify);
}
[Fact]
public async Task CorrelateAsync_PeriodicPolicy_NotifiesAtInterval()
{
// Arrange
_options.NotificationPolicy = NotificationPolicy.Periodic;
_options.PeriodicNotificationInterval = 5;
var notifyEvent = CreateTestEvent();
var incident = CreateTestIncident(eventCount: 9);
_incidentManager
.Setup(m => m.GetOrCreateIncidentAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident);
_incidentManager
.Setup(m => m.RecordEventAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(incident with { EventCount = 10 });
_quietHoursEvaluator
.Setup(e => e.EvaluateAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(SuppressionCheckResult.NotSuppressed());
_throttler
.Setup(t => t.CheckAsync(
It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(ThrottleCheckResult.NotThrottled());
// Act
var result = await _engine.CorrelateAsync(notifyEvent);
// Assert
Assert.True(result.ShouldNotify);
}
[Fact]
public async Task CheckThrottleAsync_ThrottlingDisabled_ReturnsNotThrottled()
{
// Arrange
_options.ThrottlingEnabled = false;
// Act
var result = await _engine.CheckThrottleAsync("tenant1", "key1", null);
// Assert
Assert.False(result.IsThrottled);
_throttler.Verify(
t => t.CheckAsync(It.IsAny<string>(), It.IsAny<string>(), It.IsAny<TimeSpan?>(), It.IsAny<int?>(), It.IsAny<CancellationToken>()),
Times.Never);
}
private static NotifyEvent CreateTestEvent(string? kind = null, JsonObject? payload = null)
{
return new NotifyEvent
{
EventId = Guid.NewGuid(),
Tenant = "tenant1",
Kind = kind ?? "test.event",
Payload = payload ?? new JsonObject(),
Timestamp = DateTimeOffset.UtcNow
};
}
private static IncidentState CreateTestIncident(int eventCount)
{
return new IncidentState
{
IncidentId = "inc-test123",
TenantId = "tenant1",
CorrelationKey = "test-correlation-key",
EventKind = "test.event",
Title = "Test Incident",
Status = IncidentStatus.Open,
EventCount = eventCount,
FirstOccurrence = DateTimeOffset.UtcNow.AddHours(-1),
LastOccurrence = DateTimeOffset.UtcNow
};
}
}

View File

@@ -0,0 +1,411 @@
using System.Text.Json.Nodes;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class CompositeCorrelationKeyBuilderTests
{
private readonly CompositeCorrelationKeyBuilder _builder = new();
[Fact]
public void Name_ReturnsComposite()
{
Assert.Equal("composite", _builder.Name);
}
[Fact]
public void CanHandle_CompositeType_ReturnsTrue()
{
Assert.True(_builder.CanHandle("composite"));
Assert.True(_builder.CanHandle("COMPOSITE"));
Assert.True(_builder.CanHandle("Composite"));
}
[Fact]
public void CanHandle_OtherType_ReturnsFalse()
{
Assert.False(_builder.CanHandle("template"));
Assert.False(_builder.CanHandle("jsonpath"));
}
[Fact]
public void BuildKey_TenantAndKindOnly_BuildsCorrectKey()
{
// Arrange
var notifyEvent = CreateTestEvent("tenant1", "security.alert");
var expression = new CorrelationKeyExpression
{
Type = "composite",
IncludeTenant = true,
IncludeEventKind = true
};
// Act
var key1 = _builder.BuildKey(notifyEvent, expression);
var key2 = _builder.BuildKey(notifyEvent, expression);
// Assert
Assert.NotNull(key1);
Assert.Equal(16, key1.Length); // SHA256 hash truncated to 16 chars
Assert.Equal(key1, key2); // Same input should produce same key
}
[Fact]
public void BuildKey_DifferentTenants_ProducesDifferentKeys()
{
// Arrange
var event1 = CreateTestEvent("tenant1", "security.alert");
var event2 = CreateTestEvent("tenant2", "security.alert");
var expression = CorrelationKeyExpression.Default;
// Act
var key1 = _builder.BuildKey(event1, expression);
var key2 = _builder.BuildKey(event2, expression);
// Assert
Assert.NotEqual(key1, key2);
}
[Fact]
public void BuildKey_DifferentKinds_ProducesDifferentKeys()
{
// Arrange
var event1 = CreateTestEvent("tenant1", "security.alert");
var event2 = CreateTestEvent("tenant1", "security.warning");
var expression = CorrelationKeyExpression.Default;
// Act
var key1 = _builder.BuildKey(event1, expression);
var key2 = _builder.BuildKey(event2, expression);
// Assert
Assert.NotEqual(key1, key2);
}
[Fact]
public void BuildKey_WithPayloadFields_IncludesFieldValues()
{
// Arrange
var payload1 = new JsonObject { ["source"] = "scanner-1" };
var payload2 = new JsonObject { ["source"] = "scanner-2" };
var event1 = CreateTestEvent("tenant1", "security.alert", payload1);
var event2 = CreateTestEvent("tenant1", "security.alert", payload2);
var expression = new CorrelationKeyExpression
{
Type = "composite",
IncludeTenant = true,
IncludeEventKind = true,
Fields = ["source"]
};
// Act
var key1 = _builder.BuildKey(event1, expression);
var key2 = _builder.BuildKey(event2, expression);
// Assert
Assert.NotEqual(key1, key2);
}
[Fact]
public void BuildKey_WithNestedPayloadField_ExtractsValue()
{
// Arrange
var payload = new JsonObject
{
["resource"] = new JsonObject { ["id"] = "resource-123" }
};
var notifyEvent = CreateTestEvent("tenant1", "test.event", payload);
var expression = new CorrelationKeyExpression
{
Type = "composite",
IncludeTenant = true,
Fields = ["resource.id"]
};
// Act
var key1 = _builder.BuildKey(notifyEvent, expression);
// Different resource ID
payload["resource"]!["id"] = "resource-456";
var key2 = _builder.BuildKey(notifyEvent, expression);
// Assert
Assert.NotEqual(key1, key2);
}
[Fact]
public void BuildKey_MissingPayloadField_IgnoresField()
{
// Arrange
var payload = new JsonObject { ["existing"] = "value" };
var notifyEvent = CreateTestEvent("tenant1", "test.event", payload);
var expression = new CorrelationKeyExpression
{
Type = "composite",
IncludeTenant = true,
Fields = ["nonexistent", "existing"]
};
// Act - should not throw
var key = _builder.BuildKey(notifyEvent, expression);
// Assert
Assert.NotNull(key);
}
[Fact]
public void BuildKey_ExcludeTenant_DoesNotIncludeTenant()
{
// Arrange
var event1 = CreateTestEvent("tenant1", "test.event");
var event2 = CreateTestEvent("tenant2", "test.event");
var expression = new CorrelationKeyExpression
{
Type = "composite",
IncludeTenant = false,
IncludeEventKind = true
};
// Act
var key1 = _builder.BuildKey(event1, expression);
var key2 = _builder.BuildKey(event2, expression);
// Assert - keys should be the same since tenant is excluded
Assert.Equal(key1, key2);
}
private static NotifyEvent CreateTestEvent(string tenant, string kind, JsonObject? payload = null)
{
return new NotifyEvent
{
EventId = Guid.NewGuid(),
Tenant = tenant,
Kind = kind,
Payload = payload ?? new JsonObject(),
Timestamp = DateTimeOffset.UtcNow
};
}
}
public class TemplateCorrelationKeyBuilderTests
{
private readonly TemplateCorrelationKeyBuilder _builder = new();
[Fact]
public void Name_ReturnsTemplate()
{
Assert.Equal("template", _builder.Name);
}
[Fact]
public void CanHandle_TemplateType_ReturnsTrue()
{
Assert.True(_builder.CanHandle("template"));
Assert.True(_builder.CanHandle("TEMPLATE"));
}
[Fact]
public void BuildKey_SimpleTemplate_SubstitutesVariables()
{
// Arrange
var notifyEvent = CreateTestEvent("tenant1", "security.alert");
var expression = new CorrelationKeyExpression
{
Type = "template",
Template = "{{tenant}}-{{kind}}",
IncludeTenant = false
};
// Act
var key = _builder.BuildKey(notifyEvent, expression);
// Assert
Assert.NotNull(key);
Assert.Equal(16, key.Length);
}
[Fact]
public void BuildKey_WithPayloadVariables_SubstitutesValues()
{
// Arrange
var payload = new JsonObject { ["region"] = "us-east-1" };
var notifyEvent = CreateTestEvent("tenant1", "test.event", payload);
var expression = new CorrelationKeyExpression
{
Type = "template",
Template = "{{kind}}-{{region}}",
IncludeTenant = false
};
// Act
var key1 = _builder.BuildKey(notifyEvent, expression);
payload["region"] = "eu-west-1";
var key2 = _builder.BuildKey(notifyEvent, expression);
// Assert
Assert.NotEqual(key1, key2);
}
[Fact]
public void BuildKey_WithAttributeVariables_SubstitutesValues()
{
// Arrange
var notifyEvent = new NotifyEvent
{
EventId = Guid.NewGuid(),
Tenant = "tenant1",
Kind = "test.event",
Payload = new JsonObject(),
Timestamp = DateTimeOffset.UtcNow,
Attributes = new Dictionary<string, string>
{
["env"] = "production"
}
};
var expression = new CorrelationKeyExpression
{
Type = "template",
Template = "{{kind}}-{{attr.env}}",
IncludeTenant = false
};
// Act
var key = _builder.BuildKey(notifyEvent, expression);
// Assert
Assert.NotNull(key);
}
[Fact]
public void BuildKey_IncludeTenant_PrependsTenantToKey()
{
// Arrange
var event1 = CreateTestEvent("tenant1", "test.event");
var event2 = CreateTestEvent("tenant2", "test.event");
var expression = new CorrelationKeyExpression
{
Type = "template",
Template = "{{kind}}",
IncludeTenant = true
};
// Act
var key1 = _builder.BuildKey(event1, expression);
var key2 = _builder.BuildKey(event2, expression);
// Assert
Assert.NotEqual(key1, key2);
}
[Fact]
public void BuildKey_NoTemplate_ThrowsException()
{
// Arrange
var notifyEvent = CreateTestEvent("tenant1", "test.event");
var expression = new CorrelationKeyExpression
{
Type = "template",
Template = null
};
// Act & Assert
Assert.Throws<ArgumentException>(() => _builder.BuildKey(notifyEvent, expression));
}
[Fact]
public void BuildKey_EmptyTemplate_ThrowsException()
{
// Arrange
var notifyEvent = CreateTestEvent("tenant1", "test.event");
var expression = new CorrelationKeyExpression
{
Type = "template",
Template = " "
};
// Act & Assert
Assert.Throws<ArgumentException>(() => _builder.BuildKey(notifyEvent, expression));
}
private static NotifyEvent CreateTestEvent(string tenant, string kind, JsonObject? payload = null)
{
return new NotifyEvent
{
EventId = Guid.NewGuid(),
Tenant = tenant,
Kind = kind,
Payload = payload ?? new JsonObject(),
Timestamp = DateTimeOffset.UtcNow
};
}
}
public class CorrelationKeyBuilderFactoryTests
{
[Fact]
public void GetBuilder_KnownType_ReturnsCorrectBuilder()
{
// Arrange
var builders = new ICorrelationKeyBuilder[]
{
new CompositeCorrelationKeyBuilder(),
new TemplateCorrelationKeyBuilder()
};
var factory = new CorrelationKeyBuilderFactory(builders);
// Act
var compositeBuilder = factory.GetBuilder("composite");
var templateBuilder = factory.GetBuilder("template");
// Assert
Assert.IsType<CompositeCorrelationKeyBuilder>(compositeBuilder);
Assert.IsType<TemplateCorrelationKeyBuilder>(templateBuilder);
}
[Fact]
public void GetBuilder_UnknownType_ReturnsDefaultBuilder()
{
// Arrange
var builders = new ICorrelationKeyBuilder[]
{
new CompositeCorrelationKeyBuilder(),
new TemplateCorrelationKeyBuilder()
};
var factory = new CorrelationKeyBuilderFactory(builders);
// Act
var builder = factory.GetBuilder("unknown");
// Assert
Assert.IsType<CompositeCorrelationKeyBuilder>(builder);
}
[Fact]
public void GetBuilder_CaseInsensitive_ReturnsCorrectBuilder()
{
// Arrange
var builders = new ICorrelationKeyBuilder[]
{
new CompositeCorrelationKeyBuilder(),
new TemplateCorrelationKeyBuilder()
};
var factory = new CorrelationKeyBuilderFactory(builders);
// Act
var builder1 = factory.GetBuilder("COMPOSITE");
var builder2 = factory.GetBuilder("Template");
// Assert
Assert.IsType<CompositeCorrelationKeyBuilder>(builder1);
Assert.IsType<TemplateCorrelationKeyBuilder>(builder2);
}
}

View File

@@ -0,0 +1,361 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class InMemoryIncidentManagerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly IncidentManagerOptions _options;
private readonly InMemoryIncidentManager _manager;
public InMemoryIncidentManagerTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new IncidentManagerOptions
{
CorrelationWindow = TimeSpan.FromHours(1),
ReopenOnNewEvent = true
};
_manager = new InMemoryIncidentManager(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryIncidentManager>.Instance);
}
[Fact]
public async Task GetOrCreateIncidentAsync_CreatesNewIncident()
{
// Act
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Assert
Assert.NotNull(incident);
Assert.StartsWith("inc-", incident.IncidentId);
Assert.Equal("tenant1", incident.TenantId);
Assert.Equal("correlation-key", incident.CorrelationKey);
Assert.Equal("security.alert", incident.EventKind);
Assert.Equal("Test Alert", incident.Title);
Assert.Equal(IncidentStatus.Open, incident.Status);
Assert.Equal(0, incident.EventCount);
}
[Fact]
public async Task GetOrCreateIncidentAsync_ReturnsSameIncidentWithinWindow()
{
// Arrange
var incident1 = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act - request again within correlation window
_timeProvider.Advance(TimeSpan.FromMinutes(30));
var incident2 = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Assert
Assert.Equal(incident1.IncidentId, incident2.IncidentId);
}
[Fact]
public async Task GetOrCreateIncidentAsync_CreatesNewIncidentOutsideWindow()
{
// Arrange
var incident1 = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Record an event to set LastOccurrence
await _manager.RecordEventAsync("tenant1", incident1.IncidentId, "event-1");
// Act - request again outside correlation window
_timeProvider.Advance(TimeSpan.FromHours(2));
var incident2 = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Assert
Assert.NotEqual(incident1.IncidentId, incident2.IncidentId);
}
[Fact]
public async Task GetOrCreateIncidentAsync_CreatesNewIncidentAfterResolution()
{
// Arrange
var incident1 = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
await _manager.ResolveAsync("tenant1", incident1.IncidentId, "operator");
// Act - request again after resolution
var incident2 = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Assert
Assert.NotEqual(incident1.IncidentId, incident2.IncidentId);
}
[Fact]
public async Task RecordEventAsync_IncrementsEventCount()
{
// Arrange
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act
var updated = await _manager.RecordEventAsync("tenant1", incident.IncidentId, "event-1");
// Assert
Assert.Equal(1, updated.EventCount);
Assert.Contains("event-1", updated.EventIds);
}
[Fact]
public async Task RecordEventAsync_UpdatesLastOccurrence()
{
// Arrange
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
var initialTime = incident.LastOccurrence;
// Act
_timeProvider.Advance(TimeSpan.FromMinutes(10));
var updated = await _manager.RecordEventAsync("tenant1", incident.IncidentId, "event-1");
// Assert
Assert.True(updated.LastOccurrence > initialTime);
}
[Fact]
public async Task RecordEventAsync_ReopensAcknowledgedIncident_WhenConfigured()
{
// Arrange
_options.ReopenOnNewEvent = true;
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
await _manager.AcknowledgeAsync("tenant1", incident.IncidentId, "operator");
// Act
var updated = await _manager.RecordEventAsync("tenant1", incident.IncidentId, "event-1");
// Assert
Assert.Equal(IncidentStatus.Open, updated.Status);
}
[Fact]
public async Task RecordEventAsync_ThrowsForUnknownIncident()
{
// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(
() => _manager.RecordEventAsync("tenant1", "unknown-id", "event-1"));
}
[Fact]
public async Task AcknowledgeAsync_SetsAcknowledgedStatus()
{
// Arrange
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act
var acknowledged = await _manager.AcknowledgeAsync(
"tenant1", incident.IncidentId, "operator", "Looking into it");
// Assert
Assert.NotNull(acknowledged);
Assert.Equal(IncidentStatus.Acknowledged, acknowledged.Status);
Assert.Equal("operator", acknowledged.AcknowledgedBy);
Assert.NotNull(acknowledged.AcknowledgedAt);
Assert.Equal("Looking into it", acknowledged.AcknowledgeComment);
}
[Fact]
public async Task AcknowledgeAsync_ReturnsNullForUnknownIncident()
{
// Act
var result = await _manager.AcknowledgeAsync("tenant1", "unknown-id", "operator");
// Assert
Assert.Null(result);
}
[Fact]
public async Task AcknowledgeAsync_ReturnsNullForWrongTenant()
{
// Arrange
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act
var result = await _manager.AcknowledgeAsync("tenant2", incident.IncidentId, "operator");
// Assert
Assert.Null(result);
}
[Fact]
public async Task AcknowledgeAsync_DoesNotChangeResolvedIncident()
{
// Arrange
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
await _manager.ResolveAsync("tenant1", incident.IncidentId, "operator");
// Act
var result = await _manager.AcknowledgeAsync("tenant1", incident.IncidentId, "operator2");
// Assert
Assert.NotNull(result);
Assert.Equal(IncidentStatus.Resolved, result.Status);
}
[Fact]
public async Task ResolveAsync_SetsResolvedStatus()
{
// Arrange
var incident = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act
var resolved = await _manager.ResolveAsync(
"tenant1", incident.IncidentId, "operator", "Issue fixed");
// Assert
Assert.NotNull(resolved);
Assert.Equal(IncidentStatus.Resolved, resolved.Status);
Assert.Equal("operator", resolved.ResolvedBy);
Assert.NotNull(resolved.ResolvedAt);
Assert.Equal("Issue fixed", resolved.ResolutionReason);
}
[Fact]
public async Task ResolveAsync_ReturnsNullForUnknownIncident()
{
// Act
var result = await _manager.ResolveAsync("tenant1", "unknown-id", "operator");
// Assert
Assert.Null(result);
}
[Fact]
public async Task GetAsync_ReturnsIncident()
{
// Arrange
var created = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act
var result = await _manager.GetAsync("tenant1", created.IncidentId);
// Assert
Assert.NotNull(result);
Assert.Equal(created.IncidentId, result.IncidentId);
}
[Fact]
public async Task GetAsync_ReturnsNullForUnknownIncident()
{
// Act
var result = await _manager.GetAsync("tenant1", "unknown-id");
// Assert
Assert.Null(result);
}
[Fact]
public async Task GetAsync_ReturnsNullForWrongTenant()
{
// Arrange
var created = await _manager.GetOrCreateIncidentAsync(
"tenant1", "correlation-key", "security.alert", "Test Alert");
// Act
var result = await _manager.GetAsync("tenant2", created.IncidentId);
// Assert
Assert.Null(result);
}
[Fact]
public async Task ListAsync_ReturnsIncidentsForTenant()
{
// Arrange
await _manager.GetOrCreateIncidentAsync("tenant1", "key1", "event1", "Alert 1");
await _manager.GetOrCreateIncidentAsync("tenant1", "key2", "event2", "Alert 2");
await _manager.GetOrCreateIncidentAsync("tenant2", "key3", "event3", "Alert 3");
// Act
var result = await _manager.ListAsync("tenant1");
// Assert
Assert.Equal(2, result.Count);
Assert.All(result, i => Assert.Equal("tenant1", i.TenantId));
}
[Fact]
public async Task ListAsync_FiltersbyStatus()
{
// Arrange
var inc1 = await _manager.GetOrCreateIncidentAsync("tenant1", "key1", "event1", "Alert 1");
var inc2 = await _manager.GetOrCreateIncidentAsync("tenant1", "key2", "event2", "Alert 2");
await _manager.AcknowledgeAsync("tenant1", inc1.IncidentId, "operator");
await _manager.ResolveAsync("tenant1", inc2.IncidentId, "operator");
// Act
var openIncidents = await _manager.ListAsync("tenant1", IncidentStatus.Open);
var acknowledgedIncidents = await _manager.ListAsync("tenant1", IncidentStatus.Acknowledged);
var resolvedIncidents = await _manager.ListAsync("tenant1", IncidentStatus.Resolved);
// Assert
Assert.Empty(openIncidents);
Assert.Single(acknowledgedIncidents);
Assert.Single(resolvedIncidents);
}
[Fact]
public async Task ListAsync_OrdersByLastOccurrenceDescending()
{
// Arrange
var inc1 = await _manager.GetOrCreateIncidentAsync("tenant1", "key1", "event1", "Alert 1");
await _manager.RecordEventAsync("tenant1", inc1.IncidentId, "e1");
_timeProvider.Advance(TimeSpan.FromMinutes(1));
var inc2 = await _manager.GetOrCreateIncidentAsync("tenant1", "key2", "event2", "Alert 2");
await _manager.RecordEventAsync("tenant1", inc2.IncidentId, "e2");
_timeProvider.Advance(TimeSpan.FromMinutes(1));
var inc3 = await _manager.GetOrCreateIncidentAsync("tenant1", "key3", "event3", "Alert 3");
await _manager.RecordEventAsync("tenant1", inc3.IncidentId, "e3");
// Act
var result = await _manager.ListAsync("tenant1");
// Assert
Assert.Equal(3, result.Count);
Assert.Equal(inc3.IncidentId, result[0].IncidentId);
Assert.Equal(inc2.IncidentId, result[1].IncidentId);
Assert.Equal(inc1.IncidentId, result[2].IncidentId);
}
[Fact]
public async Task ListAsync_RespectsLimit()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _manager.GetOrCreateIncidentAsync("tenant1", $"key{i}", $"event{i}", $"Alert {i}");
}
// Act
var result = await _manager.ListAsync("tenant1", limit: 5);
// Assert
Assert.Equal(5, result.Count);
}
}

View File

@@ -0,0 +1,269 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class InMemoryNotifyThrottlerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly ThrottlerOptions _options;
private readonly InMemoryNotifyThrottler _throttler;
public InMemoryNotifyThrottlerTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new ThrottlerOptions
{
DefaultWindow = TimeSpan.FromMinutes(5),
DefaultMaxEvents = 10,
Enabled = true
};
_throttler = new InMemoryNotifyThrottler(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryNotifyThrottler>.Instance);
}
[Fact]
public async Task RecordEventAsync_AddsEventToState()
{
// Act
await _throttler.RecordEventAsync("tenant1", "key1");
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.False(result.IsThrottled);
Assert.Equal(1, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_NoEvents_ReturnsNotThrottled()
{
// Act
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.False(result.IsThrottled);
Assert.Equal(0, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_BelowThreshold_ReturnsNotThrottled()
{
// Arrange
for (int i = 0; i < 5; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.False(result.IsThrottled);
Assert.Equal(5, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_AtThreshold_ReturnsThrottled()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.True(result.IsThrottled);
Assert.Equal(10, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_AboveThreshold_ReturnsThrottled()
{
// Arrange
for (int i = 0; i < 15; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.True(result.IsThrottled);
Assert.Equal(15, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_EventsOutsideWindow_AreRemoved()
{
// Arrange
for (int i = 0; i < 8; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Move time forward past the window
_timeProvider.Advance(TimeSpan.FromMinutes(6));
// Act
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.False(result.IsThrottled);
Assert.Equal(0, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_CustomWindow_UsesCustomValue()
{
// Arrange
for (int i = 0; i < 5; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Move time forward 2 minutes
_timeProvider.Advance(TimeSpan.FromMinutes(2));
// Add more events
for (int i = 0; i < 3; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act - check with 1 minute window (should only see recent 3)
var result = await _throttler.CheckAsync("tenant1", "key1", TimeSpan.FromMinutes(1), null);
// Assert
Assert.False(result.IsThrottled);
Assert.Equal(3, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_CustomMaxEvents_UsesCustomValue()
{
// Arrange
for (int i = 0; i < 5; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act - check with max 3 events
var result = await _throttler.CheckAsync("tenant1", "key1", null, 3);
// Assert
Assert.True(result.IsThrottled);
Assert.Equal(5, result.RecentEventCount);
}
[Fact]
public async Task CheckAsync_ThrottledReturnsResetTime()
{
// Arrange
await _throttler.RecordEventAsync("tenant1", "key1");
// Move time forward 2 minutes
_timeProvider.Advance(TimeSpan.FromMinutes(2));
// Fill up to threshold
for (int i = 0; i < 9; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act
var result = await _throttler.CheckAsync("tenant1", "key1", null, null);
// Assert
Assert.True(result.IsThrottled);
Assert.NotNull(result.ThrottleResetIn);
// Reset should be ~3 minutes (5 min window - 2 min since oldest event)
Assert.True(result.ThrottleResetIn.Value > TimeSpan.FromMinutes(2));
}
[Fact]
public async Task CheckAsync_DifferentKeys_TrackedSeparately()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act
var result1 = await _throttler.CheckAsync("tenant1", "key1", null, null);
var result2 = await _throttler.CheckAsync("tenant1", "key2", null, null);
// Assert
Assert.True(result1.IsThrottled);
Assert.False(result2.IsThrottled);
}
[Fact]
public async Task CheckAsync_DifferentTenants_TrackedSeparately()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Act
var result1 = await _throttler.CheckAsync("tenant1", "key1", null, null);
var result2 = await _throttler.CheckAsync("tenant2", "key1", null, null);
// Assert
Assert.True(result1.IsThrottled);
Assert.False(result2.IsThrottled);
}
[Fact]
public async Task ClearAsync_RemovesThrottleState()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
}
// Verify throttled
var beforeClear = await _throttler.CheckAsync("tenant1", "key1", null, null);
Assert.True(beforeClear.IsThrottled);
// Act
await _throttler.ClearAsync("tenant1", "key1");
// Assert
var afterClear = await _throttler.CheckAsync("tenant1", "key1", null, null);
Assert.False(afterClear.IsThrottled);
Assert.Equal(0, afterClear.RecentEventCount);
}
[Fact]
public async Task ClearAsync_OnlyAffectsSpecifiedKey()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _throttler.RecordEventAsync("tenant1", "key1");
await _throttler.RecordEventAsync("tenant1", "key2");
}
// Act
await _throttler.ClearAsync("tenant1", "key1");
// Assert
var result1 = await _throttler.CheckAsync("tenant1", "key1", null, null);
var result2 = await _throttler.CheckAsync("tenant1", "key2", null, null);
Assert.False(result1.IsThrottled);
Assert.True(result2.IsThrottled);
}
}

View File

@@ -0,0 +1,451 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class OperatorOverrideServiceTests
{
private readonly Mock<ISuppressionAuditLogger> _auditLogger;
private readonly FakeTimeProvider _timeProvider;
private readonly OperatorOverrideOptions _options;
private readonly InMemoryOperatorOverrideService _service;
public OperatorOverrideServiceTests()
{
_auditLogger = new Mock<ISuppressionAuditLogger>();
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 14, 0, 0, TimeSpan.Zero));
_options = new OperatorOverrideOptions
{
MinDuration = TimeSpan.FromMinutes(5),
MaxDuration = TimeSpan.FromHours(24),
MaxActiveOverridesPerTenant = 50
};
_service = new InMemoryOperatorOverrideService(
_auditLogger.Object,
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryOperatorOverrideService>.Instance);
}
[Fact]
public async Task CreateOverrideAsync_CreatesNewOverride()
{
// Arrange
var request = new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Emergency deployment requiring immediate notifications",
Duration = TimeSpan.FromHours(2)
};
// Act
var @override = await _service.CreateOverrideAsync("tenant1", request, "admin@example.com");
// Assert
Assert.NotNull(@override);
Assert.StartsWith("ovr-", @override.OverrideId);
Assert.Equal("tenant1", @override.TenantId);
Assert.Equal(OverrideType.All, @override.Type);
Assert.Equal("Emergency deployment requiring immediate notifications", @override.Reason);
Assert.Equal(OverrideStatus.Active, @override.Status);
Assert.Equal("admin@example.com", @override.CreatedBy);
Assert.Equal(_timeProvider.GetUtcNow() + TimeSpan.FromHours(2), @override.ExpiresAt);
}
[Fact]
public async Task CreateOverrideAsync_RejectsDurationTooLong()
{
// Arrange
var request = new OperatorOverrideCreate
{
Type = OverrideType.QuietHours,
Reason = "Very long override",
Duration = TimeSpan.FromHours(48) // Exceeds max 24 hours
};
// Act & Assert
await Assert.ThrowsAsync<ArgumentException>(() =>
_service.CreateOverrideAsync("tenant1", request, "admin"));
}
[Fact]
public async Task CreateOverrideAsync_RejectsDurationTooShort()
{
// Arrange
var request = new OperatorOverrideCreate
{
Type = OverrideType.QuietHours,
Reason = "Very short override",
Duration = TimeSpan.FromMinutes(1) // Below min 5 minutes
};
// Act & Assert
await Assert.ThrowsAsync<ArgumentException>(() =>
_service.CreateOverrideAsync("tenant1", request, "admin"));
}
[Fact]
public async Task CreateOverrideAsync_LogsAuditEntry()
{
// Arrange
var request = new OperatorOverrideCreate
{
Type = OverrideType.QuietHours,
Reason = "Test override for audit",
Duration = TimeSpan.FromHours(1)
};
// Act
await _service.CreateOverrideAsync("tenant1", request, "admin");
// Assert
_auditLogger.Verify(a => a.LogAsync(
It.Is<SuppressionAuditEntry>(e =>
e.Action == SuppressionAuditAction.OverrideCreated &&
e.Actor == "admin" &&
e.TenantId == "tenant1"),
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task GetOverrideAsync_ReturnsOverrideIfExists()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.Throttle,
Reason = "Test override",
Duration = TimeSpan.FromHours(1)
}, "admin");
// Act
var retrieved = await _service.GetOverrideAsync("tenant1", created.OverrideId);
// Assert
Assert.NotNull(retrieved);
Assert.Equal(created.OverrideId, retrieved.OverrideId);
}
[Fact]
public async Task GetOverrideAsync_ReturnsExpiredStatusAfterExpiry()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Short override",
Duration = TimeSpan.FromMinutes(30)
}, "admin");
// Advance time past expiry
_timeProvider.Advance(TimeSpan.FromMinutes(31));
// Act
var retrieved = await _service.GetOverrideAsync("tenant1", created.OverrideId);
// Assert
Assert.NotNull(retrieved);
Assert.Equal(OverrideStatus.Expired, retrieved.Status);
}
[Fact]
public async Task ListActiveOverridesAsync_ReturnsOnlyActiveOverrides()
{
// Arrange
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Override 1",
Duration = TimeSpan.FromHours(2)
}, "admin");
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.QuietHours,
Reason = "Override 2 (short)",
Duration = TimeSpan.FromMinutes(10)
}, "admin");
// Advance time so second override expires
_timeProvider.Advance(TimeSpan.FromMinutes(15));
// Act
var active = await _service.ListActiveOverridesAsync("tenant1");
// Assert
Assert.Single(active);
Assert.Equal("Override 1", active[0].Reason);
}
[Fact]
public async Task RevokeOverrideAsync_RevokesActiveOverride()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "To be revoked",
Duration = TimeSpan.FromHours(1)
}, "admin");
// Act
var revoked = await _service.RevokeOverrideAsync("tenant1", created.OverrideId, "supervisor", "No longer needed");
// Assert
Assert.True(revoked);
var retrieved = await _service.GetOverrideAsync("tenant1", created.OverrideId);
Assert.NotNull(retrieved);
Assert.Equal(OverrideStatus.Revoked, retrieved.Status);
Assert.Equal("supervisor", retrieved.RevokedBy);
Assert.Equal("No longer needed", retrieved.RevocationReason);
}
[Fact]
public async Task RevokeOverrideAsync_LogsAuditEntry()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "To be revoked",
Duration = TimeSpan.FromHours(1)
}, "admin");
// Act
await _service.RevokeOverrideAsync("tenant1", created.OverrideId, "supervisor", "Testing");
// Assert
_auditLogger.Verify(a => a.LogAsync(
It.Is<SuppressionAuditEntry>(e =>
e.Action == SuppressionAuditAction.OverrideRevoked &&
e.Actor == "supervisor"),
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task CheckOverrideAsync_ReturnsMatchingOverride()
{
// Arrange
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.QuietHours,
Reason = "Deployment override",
Duration = TimeSpan.FromHours(1)
}, "admin");
// Act
var result = await _service.CheckOverrideAsync("tenant1", "deployment.complete", null);
// Assert
Assert.True(result.HasOverride);
Assert.NotNull(result.Override);
Assert.Equal(OverrideType.QuietHours, result.BypassedTypes);
}
[Fact]
public async Task CheckOverrideAsync_ReturnsNoOverrideWhenNoneMatch()
{
// Act
var result = await _service.CheckOverrideAsync("tenant1", "event.test", null);
// Assert
Assert.False(result.HasOverride);
Assert.Null(result.Override);
}
[Fact]
public async Task CheckOverrideAsync_RespectsEventKindFilter()
{
// Arrange
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Only for deployments",
Duration = TimeSpan.FromHours(1),
EventKinds = ["deployment.", "release."]
}, "admin");
// Act
var deploymentResult = await _service.CheckOverrideAsync("tenant1", "deployment.started", null);
var otherResult = await _service.CheckOverrideAsync("tenant1", "vulnerability.found", null);
// Assert
Assert.True(deploymentResult.HasOverride);
Assert.False(otherResult.HasOverride);
}
[Fact]
public async Task CheckOverrideAsync_RespectsCorrelationKeyFilter()
{
// Arrange
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.Throttle,
Reason = "Specific incident",
Duration = TimeSpan.FromHours(1),
CorrelationKeys = ["incident-123", "incident-456"]
}, "admin");
// Act
var matchingResult = await _service.CheckOverrideAsync("tenant1", "event.test", "incident-123");
var nonMatchingResult = await _service.CheckOverrideAsync("tenant1", "event.test", "incident-789");
// Assert
Assert.True(matchingResult.HasOverride);
Assert.False(nonMatchingResult.HasOverride);
}
[Fact]
public async Task RecordOverrideUsageAsync_IncrementsUsageCount()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Limited use override",
Duration = TimeSpan.FromHours(1),
MaxUsageCount = 5
}, "admin");
// Act
await _service.RecordOverrideUsageAsync("tenant1", created.OverrideId, "event.test");
await _service.RecordOverrideUsageAsync("tenant1", created.OverrideId, "event.test");
// Assert
var updated = await _service.GetOverrideAsync("tenant1", created.OverrideId);
Assert.NotNull(updated);
Assert.Equal(2, updated.UsageCount);
}
[Fact]
public async Task RecordOverrideUsageAsync_ExhaustsOverrideAtMaxUsage()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Single use override",
Duration = TimeSpan.FromHours(1),
MaxUsageCount = 2
}, "admin");
// Act
await _service.RecordOverrideUsageAsync("tenant1", created.OverrideId, "event.test");
await _service.RecordOverrideUsageAsync("tenant1", created.OverrideId, "event.test");
// Assert
var updated = await _service.GetOverrideAsync("tenant1", created.OverrideId);
Assert.NotNull(updated);
Assert.Equal(OverrideStatus.Exhausted, updated.Status);
}
[Fact]
public async Task RecordOverrideUsageAsync_LogsAuditEntry()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Override for audit test",
Duration = TimeSpan.FromHours(1)
}, "admin");
// Act
await _service.RecordOverrideUsageAsync("tenant1", created.OverrideId, "event.test");
// Assert
_auditLogger.Verify(a => a.LogAsync(
It.Is<SuppressionAuditEntry>(e =>
e.Action == SuppressionAuditAction.OverrideUsed &&
e.ResourceId == created.OverrideId),
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task CheckOverrideAsync_DoesNotReturnExhaustedOverride()
{
// Arrange
var created = await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Single use",
Duration = TimeSpan.FromHours(1),
MaxUsageCount = 1
}, "admin");
await _service.RecordOverrideUsageAsync("tenant1", created.OverrideId, "event.test");
// Act
var result = await _service.CheckOverrideAsync("tenant1", "event.other", null);
// Assert
Assert.False(result.HasOverride);
}
[Fact]
public async Task CreateOverrideAsync_WithDeferredEffectiveFrom()
{
// Arrange
var futureTime = _timeProvider.GetUtcNow().AddHours(1);
var request = new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Future override",
Duration = TimeSpan.FromHours(2),
EffectiveFrom = futureTime
};
// Act
var created = await _service.CreateOverrideAsync("tenant1", request, "admin");
// Assert
Assert.Equal(futureTime, created.EffectiveFrom);
Assert.Equal(futureTime + TimeSpan.FromHours(2), created.ExpiresAt);
}
[Fact]
public async Task CheckOverrideAsync_DoesNotReturnNotYetEffectiveOverride()
{
// Arrange
var futureTime = _timeProvider.GetUtcNow().AddHours(1);
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.All,
Reason = "Future override",
Duration = TimeSpan.FromHours(2),
EffectiveFrom = futureTime
}, "admin");
// Act (before effective time)
var result = await _service.CheckOverrideAsync("tenant1", "event.test", null);
// Assert
Assert.False(result.HasOverride);
}
[Fact]
public async Task OverrideType_Flags_WorkCorrectly()
{
// Arrange
await _service.CreateOverrideAsync("tenant1", new OperatorOverrideCreate
{
Type = OverrideType.QuietHours | OverrideType.Throttle, // Multiple types
Reason = "Partial override",
Duration = TimeSpan.FromHours(1)
}, "admin");
// Act
var result = await _service.CheckOverrideAsync("tenant1", "event.test", null);
// Assert
Assert.True(result.HasOverride);
Assert.True(result.BypassedTypes.HasFlag(OverrideType.QuietHours));
Assert.True(result.BypassedTypes.HasFlag(OverrideType.Throttle));
Assert.False(result.BypassedTypes.HasFlag(OverrideType.Maintenance));
}
}

View File

@@ -0,0 +1,402 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class QuietHourCalendarServiceTests
{
private readonly Mock<ISuppressionAuditLogger> _auditLogger;
private readonly FakeTimeProvider _timeProvider;
private readonly InMemoryQuietHourCalendarService _service;
public QuietHourCalendarServiceTests()
{
_auditLogger = new Mock<ISuppressionAuditLogger>();
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 14, 0, 0, TimeSpan.Zero)); // Monday 2pm UTC
_service = new InMemoryQuietHourCalendarService(
_auditLogger.Object,
_timeProvider,
NullLogger<InMemoryQuietHourCalendarService>.Instance);
}
[Fact]
public async Task CreateCalendarAsync_CreatesNewCalendar()
{
// Arrange
var request = new QuietHourCalendarCreate
{
Name = "Night Quiet Hours",
Description = "Suppress notifications overnight",
Schedules =
[
new CalendarSchedule
{
Name = "Overnight",
StartTime = "22:00",
EndTime = "08:00"
}
]
};
// Act
var calendar = await _service.CreateCalendarAsync("tenant1", request, "admin@example.com");
// Assert
Assert.NotNull(calendar);
Assert.StartsWith("cal-", calendar.CalendarId);
Assert.Equal("tenant1", calendar.TenantId);
Assert.Equal("Night Quiet Hours", calendar.Name);
Assert.True(calendar.Enabled);
Assert.Single(calendar.Schedules);
Assert.Equal("admin@example.com", calendar.CreatedBy);
}
[Fact]
public async Task CreateCalendarAsync_LogsAuditEntry()
{
// Arrange
var request = new QuietHourCalendarCreate
{
Name = "Test Calendar"
};
// Act
await _service.CreateCalendarAsync("tenant1", request, "admin");
// Assert
_auditLogger.Verify(a => a.LogAsync(
It.Is<SuppressionAuditEntry>(e =>
e.Action == SuppressionAuditAction.CalendarCreated &&
e.Actor == "admin" &&
e.TenantId == "tenant1"),
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task ListCalendarsAsync_ReturnsAllCalendarsForTenant()
{
// Arrange
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate { Name = "Calendar 1", Priority = 50 }, "admin");
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate { Name = "Calendar 2", Priority = 100 }, "admin");
await _service.CreateCalendarAsync("tenant2", new QuietHourCalendarCreate { Name = "Other Tenant" }, "admin");
// Act
var calendars = await _service.ListCalendarsAsync("tenant1");
// Assert
Assert.Equal(2, calendars.Count);
Assert.Equal("Calendar 1", calendars[0].Name); // Lower priority first
Assert.Equal("Calendar 2", calendars[1].Name);
}
[Fact]
public async Task GetCalendarAsync_ReturnsCalendarIfExists()
{
// Arrange
var created = await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate { Name = "Test" }, "admin");
// Act
var retrieved = await _service.GetCalendarAsync("tenant1", created.CalendarId);
// Assert
Assert.NotNull(retrieved);
Assert.Equal(created.CalendarId, retrieved.CalendarId);
Assert.Equal("Test", retrieved.Name);
}
[Fact]
public async Task GetCalendarAsync_ReturnsNullIfNotExists()
{
// Act
var result = await _service.GetCalendarAsync("tenant1", "nonexistent");
// Assert
Assert.Null(result);
}
[Fact]
public async Task UpdateCalendarAsync_UpdatesExistingCalendar()
{
// Arrange
var created = await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate { Name = "Original" }, "admin");
var update = new QuietHourCalendarUpdate
{
Name = "Updated",
Enabled = false
};
// Act
var updated = await _service.UpdateCalendarAsync("tenant1", created.CalendarId, update, "other-admin");
// Assert
Assert.NotNull(updated);
Assert.Equal("Updated", updated.Name);
Assert.False(updated.Enabled);
Assert.Equal("other-admin", updated.UpdatedBy);
}
[Fact]
public async Task DeleteCalendarAsync_RemovesCalendar()
{
// Arrange
var created = await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate { Name = "ToDelete" }, "admin");
// Act
var deleted = await _service.DeleteCalendarAsync("tenant1", created.CalendarId, "admin");
// Assert
Assert.True(deleted);
var retrieved = await _service.GetCalendarAsync("tenant1", created.CalendarId);
Assert.Null(retrieved);
}
[Fact]
public async Task EvaluateCalendarsAsync_SuppressesWhenInQuietHours()
{
// Arrange - Create calendar with quiet hours from 10pm to 8am
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Night Hours",
Schedules =
[
new CalendarSchedule
{
Name = "Overnight",
StartTime = "22:00",
EndTime = "08:00"
}
]
}, "admin");
// Set time to 23:00 (11pm) - within quiet hours
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 23, 0, 0, TimeSpan.Zero));
// Act
var result = await _service.EvaluateCalendarsAsync("tenant1", "vulnerability.found", null);
// Assert
Assert.True(result.IsSuppressed);
Assert.Equal("Night Hours", result.CalendarName);
Assert.Equal("Overnight", result.ScheduleName);
}
[Fact]
public async Task EvaluateCalendarsAsync_DoesNotSuppressOutsideQuietHours()
{
// Arrange - Create calendar with quiet hours from 10pm to 8am
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Night Hours",
Schedules =
[
new CalendarSchedule
{
Name = "Overnight",
StartTime = "22:00",
EndTime = "08:00"
}
]
}, "admin");
// Time is 2pm (14:00) - outside quiet hours
// Act
var result = await _service.EvaluateCalendarsAsync("tenant1", "vulnerability.found", null);
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateCalendarsAsync_RespectsExcludedEventKinds()
{
// Arrange
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Night Hours",
ExcludedEventKinds = ["critical.", "urgent."],
Schedules =
[
new CalendarSchedule
{
Name = "Overnight",
StartTime = "22:00",
EndTime = "08:00"
}
]
}, "admin");
// Set time to 23:00 (11pm) - within quiet hours
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 23, 0, 0, TimeSpan.Zero));
// Act
var criticalResult = await _service.EvaluateCalendarsAsync("tenant1", "critical.security.breach", null);
var normalResult = await _service.EvaluateCalendarsAsync("tenant1", "info.scan.complete", null);
// Assert
Assert.False(criticalResult.IsSuppressed); // Critical events not suppressed
Assert.True(normalResult.IsSuppressed); // Normal events suppressed
}
[Fact]
public async Task EvaluateCalendarsAsync_RespectsEventKindFilters()
{
// Arrange
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Scan Quiet Hours",
EventKinds = ["scan."], // Only applies to scan events
Schedules =
[
new CalendarSchedule
{
Name = "Always",
StartTime = "00:00",
EndTime = "23:59"
}
]
}, "admin");
// Act
var scanResult = await _service.EvaluateCalendarsAsync("tenant1", "scan.complete", null);
var otherResult = await _service.EvaluateCalendarsAsync("tenant1", "vulnerability.found", null);
// Assert
Assert.True(scanResult.IsSuppressed);
Assert.False(otherResult.IsSuppressed);
}
[Fact]
public async Task EvaluateCalendarsAsync_RespectsScopes()
{
// Arrange
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Team A Quiet Hours",
Scopes = ["team-a", "team-b"],
Schedules =
[
new CalendarSchedule
{
Name = "All Day",
StartTime = "00:00",
EndTime = "23:59"
}
]
}, "admin");
// Act
var teamAResult = await _service.EvaluateCalendarsAsync("tenant1", "event.test", ["team-a"]);
var teamCResult = await _service.EvaluateCalendarsAsync("tenant1", "event.test", ["team-c"]);
// Assert
Assert.True(teamAResult.IsSuppressed);
Assert.False(teamCResult.IsSuppressed);
}
[Fact]
public async Task EvaluateCalendarsAsync_RespectsDaysOfWeek()
{
// Arrange - Create calendar that only applies on weekends
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Weekend Hours",
Schedules =
[
new CalendarSchedule
{
Name = "Weekend Only",
StartTime = "00:00",
EndTime = "23:59",
DaysOfWeek = [0, 6] // Sunday and Saturday
}
]
}, "admin");
// Monday (current time is Monday)
var mondayResult = await _service.EvaluateCalendarsAsync("tenant1", "event.test", null);
// Set to Saturday
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 20, 14, 0, 0, TimeSpan.Zero));
var saturdayResult = await _service.EvaluateCalendarsAsync("tenant1", "event.test", null);
// Assert
Assert.False(mondayResult.IsSuppressed);
Assert.True(saturdayResult.IsSuppressed);
}
[Fact]
public async Task EvaluateCalendarsAsync_DisabledCalendarDoesNotSuppress()
{
// Arrange
var created = await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Night Hours",
Schedules =
[
new CalendarSchedule
{
Name = "All Day",
StartTime = "00:00",
EndTime = "23:59"
}
]
}, "admin");
// Disable the calendar
await _service.UpdateCalendarAsync("tenant1", created.CalendarId, new QuietHourCalendarUpdate { Enabled = false }, "admin");
// Act
var result = await _service.EvaluateCalendarsAsync("tenant1", "event.test", null);
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateCalendarsAsync_HigherPriorityCalendarWins()
{
// Arrange - Create two calendars with different priorities
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "Low Priority",
Priority = 100,
ExcludedEventKinds = ["critical."], // This one excludes critical
Schedules =
[
new CalendarSchedule
{
Name = "All Day",
StartTime = "00:00",
EndTime = "23:59"
}
]
}, "admin");
await _service.CreateCalendarAsync("tenant1", new QuietHourCalendarCreate
{
Name = "High Priority",
Priority = 10, // Higher priority (lower number)
Schedules =
[
new CalendarSchedule
{
Name = "All Day",
StartTime = "00:00",
EndTime = "23:59"
}
]
}, "admin");
// Act
var result = await _service.EvaluateCalendarsAsync("tenant1", "critical.alert", null);
// Assert
Assert.True(result.IsSuppressed);
Assert.Equal("High Priority", result.CalendarName); // High priority calendar applies first
}
}

View File

@@ -0,0 +1,349 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notify.Storage.Mongo.Repositories;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class QuietHoursCalendarServiceTests
{
private readonly Mock<INotifyAuditRepository> _auditRepository;
private readonly FakeTimeProvider _timeProvider;
private readonly InMemoryQuietHoursCalendarService _service;
public QuietHoursCalendarServiceTests()
{
_auditRepository = new Mock<INotifyAuditRepository>();
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 14, 30, 0, TimeSpan.Zero)); // Monday 14:30 UTC
_service = new InMemoryQuietHoursCalendarService(
_auditRepository.Object,
_timeProvider,
NullLogger<InMemoryQuietHoursCalendarService>.Instance);
}
[Fact]
public async Task ListCalendarsAsync_EmptyTenant_ReturnsEmptyList()
{
// Act
var result = await _service.ListCalendarsAsync("tenant1");
// Assert
Assert.Empty(result);
}
[Fact]
public async Task UpsertCalendarAsync_NewCalendar_CreatesCalendar()
{
// Arrange
var calendar = CreateTestCalendar("cal-1", "tenant1");
// Act
var result = await _service.UpsertCalendarAsync(calendar, "admin");
// Assert
Assert.Equal("cal-1", result.CalendarId);
Assert.Equal("tenant1", result.TenantId);
Assert.Equal(_timeProvider.GetUtcNow(), result.CreatedAt);
Assert.Equal("admin", result.CreatedBy);
}
[Fact]
public async Task UpsertCalendarAsync_ExistingCalendar_UpdatesCalendar()
{
// Arrange
var calendar = CreateTestCalendar("cal-1", "tenant1");
await _service.UpsertCalendarAsync(calendar, "admin");
_timeProvider.Advance(TimeSpan.FromMinutes(5));
var updated = calendar with { Name = "Updated Name" };
// Act
var result = await _service.UpsertCalendarAsync(updated, "admin2");
// Assert
Assert.Equal("Updated Name", result.Name);
Assert.Equal("admin", result.CreatedBy); // Original creator preserved
Assert.Equal("admin2", result.UpdatedBy);
}
[Fact]
public async Task GetCalendarAsync_ExistingCalendar_ReturnsCalendar()
{
// Arrange
var calendar = CreateTestCalendar("cal-1", "tenant1");
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.GetCalendarAsync("tenant1", "cal-1");
// Assert
Assert.NotNull(result);
Assert.Equal("cal-1", result.CalendarId);
}
[Fact]
public async Task GetCalendarAsync_NonExistentCalendar_ReturnsNull()
{
// Act
var result = await _service.GetCalendarAsync("tenant1", "nonexistent");
// Assert
Assert.Null(result);
}
[Fact]
public async Task DeleteCalendarAsync_ExistingCalendar_ReturnsTrue()
{
// Arrange
var calendar = CreateTestCalendar("cal-1", "tenant1");
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.DeleteCalendarAsync("tenant1", "cal-1", "admin");
// Assert
Assert.True(result);
Assert.Null(await _service.GetCalendarAsync("tenant1", "cal-1"));
}
[Fact]
public async Task DeleteCalendarAsync_NonExistentCalendar_ReturnsFalse()
{
// Act
var result = await _service.DeleteCalendarAsync("tenant1", "nonexistent", "admin");
// Assert
Assert.False(result);
}
[Fact]
public async Task EvaluateAsync_NoCalendars_ReturnsNotActive()
{
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert
Assert.False(result.IsActive);
}
[Fact]
public async Task EvaluateAsync_DisabledCalendar_ReturnsNotActive()
{
// Arrange
var calendar = CreateTestCalendar("cal-1", "tenant1") with { Enabled = false };
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert
Assert.False(result.IsActive);
}
[Fact]
public async Task EvaluateAsync_WithinQuietHours_ReturnsActive()
{
// Arrange - Set time to 22:30 UTC (within 22:00-08:00 quiet hours)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 22, 30, 0, TimeSpan.Zero));
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "22:00", endTime: "08:00");
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert
Assert.True(result.IsActive);
Assert.Equal("cal-1", result.MatchedCalendarId);
Assert.NotNull(result.EndsAt);
}
[Fact]
public async Task EvaluateAsync_OutsideQuietHours_ReturnsNotActive()
{
// Arrange - Time is 14:30 UTC (outside 22:00-08:00 quiet hours)
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "22:00", endTime: "08:00");
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert
Assert.False(result.IsActive);
}
[Fact]
public async Task EvaluateAsync_WithExcludedEventKind_ReturnsNotActive()
{
// Arrange - Set time within quiet hours
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 23, 0, 0, TimeSpan.Zero));
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "22:00", endTime: "08:00") with
{
ExcludedEventKinds = new[] { "critical." }
};
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "critical.alert");
// Assert
Assert.False(result.IsActive);
}
[Fact]
public async Task EvaluateAsync_WithIncludedEventKind_OnlyMatchesIncluded()
{
// Arrange - Set time within quiet hours
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 23, 0, 0, TimeSpan.Zero));
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "22:00", endTime: "08:00") with
{
IncludedEventKinds = new[] { "info." }
};
await _service.UpsertCalendarAsync(calendar, "admin");
// Act - Test included event kind
var resultIncluded = await _service.EvaluateAsync("tenant1", "info.status");
// Act - Test non-included event kind
var resultExcluded = await _service.EvaluateAsync("tenant1", "warning.alert");
// Assert
Assert.True(resultIncluded.IsActive);
Assert.False(resultExcluded.IsActive);
}
[Fact]
public async Task EvaluateAsync_WithDayOfWeekRestriction_OnlyMatchesSpecifiedDays()
{
// Arrange - Monday (day 1)
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "00:00", endTime: "23:59") with
{
Schedules = new[]
{
new QuietHoursScheduleEntry
{
Name = "Weekends Only",
StartTime = "00:00",
EndTime = "23:59",
DaysOfWeek = new[] { 0, 6 }, // Sunday, Saturday
Enabled = true
}
}
};
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert - Should not be active on Monday
Assert.False(result.IsActive);
}
[Fact]
public async Task EvaluateAsync_PriorityOrdering_ReturnsHighestPriority()
{
// Arrange - Set time within quiet hours
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 23, 0, 0, TimeSpan.Zero));
var calendar1 = CreateTestCalendar("cal-low", "tenant1", startTime: "22:00", endTime: "08:00") with
{
Name = "Low Priority",
Priority = 100
};
var calendar2 = CreateTestCalendar("cal-high", "tenant1", startTime: "22:00", endTime: "08:00") with
{
Name = "High Priority",
Priority = 10
};
await _service.UpsertCalendarAsync(calendar1, "admin");
await _service.UpsertCalendarAsync(calendar2, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert - Should match higher priority (lower number)
Assert.True(result.IsActive);
Assert.Equal("cal-high", result.MatchedCalendarId);
}
[Fact]
public async Task EvaluateAsync_SameDayWindow_EvaluatesCorrectly()
{
// Arrange - Set time to 10:30 UTC (within 09:00-17:00 business hours)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 15, 10, 30, 0, TimeSpan.Zero));
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "09:00", endTime: "17:00");
await _service.UpsertCalendarAsync(calendar, "admin");
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test");
// Assert
Assert.True(result.IsActive);
}
[Fact]
public async Task EvaluateAsync_WithCustomEvaluationTime_UsesProvidedTime()
{
// Arrange - Current time is 14:30, but we evaluate at 23:00
var calendar = CreateTestCalendar("cal-1", "tenant1", startTime: "22:00", endTime: "08:00");
await _service.UpsertCalendarAsync(calendar, "admin");
var evaluationTime = new DateTimeOffset(2024, 1, 15, 23, 0, 0, TimeSpan.Zero);
// Act
var result = await _service.EvaluateAsync("tenant1", "event.test", evaluationTime);
// Assert
Assert.True(result.IsActive);
}
[Fact]
public async Task ListCalendarsAsync_ReturnsOrderedByPriority()
{
// Arrange
await _service.UpsertCalendarAsync(
CreateTestCalendar("cal-3", "tenant1") with { Priority = 300 }, "admin");
await _service.UpsertCalendarAsync(
CreateTestCalendar("cal-1", "tenant1") with { Priority = 100 }, "admin");
await _service.UpsertCalendarAsync(
CreateTestCalendar("cal-2", "tenant1") with { Priority = 200 }, "admin");
// Act
var result = await _service.ListCalendarsAsync("tenant1");
// Assert
Assert.Equal(3, result.Count);
Assert.Equal("cal-1", result[0].CalendarId);
Assert.Equal("cal-2", result[1].CalendarId);
Assert.Equal("cal-3", result[2].CalendarId);
}
private static QuietHoursCalendar CreateTestCalendar(
string calendarId,
string tenantId,
string startTime = "22:00",
string endTime = "08:00") => new()
{
CalendarId = calendarId,
TenantId = tenantId,
Name = $"Test Calendar {calendarId}",
Enabled = true,
Priority = 100,
Schedules = new[]
{
new QuietHoursScheduleEntry
{
Name = "Default Schedule",
StartTime = startTime,
EndTime = endTime,
Enabled = true
}
}
};
}

View File

@@ -0,0 +1,466 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class QuietHoursEvaluatorTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly QuietHoursOptions _options;
private readonly QuietHoursEvaluator _evaluator;
public QuietHoursEvaluatorTests()
{
// Start at 10:00 AM UTC on a Wednesday
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 10, 10, 0, 0, TimeSpan.Zero));
_options = new QuietHoursOptions { Enabled = true };
_evaluator = CreateEvaluator();
}
private QuietHoursEvaluator CreateEvaluator()
{
return new QuietHoursEvaluator(
Options.Create(_options),
_timeProvider,
NullLogger<QuietHoursEvaluator>.Instance);
}
[Fact]
public async Task EvaluateAsync_NoSchedule_ReturnsNotSuppressed()
{
// Arrange
_options.Schedule = null;
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_DisabledSchedule_ReturnsNotSuppressed()
{
// Arrange
_options.Schedule = new QuietHoursSchedule { Enabled = false };
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_DisabledGlobally_ReturnsNotSuppressed()
{
// Arrange
_options.Enabled = false;
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "00:00",
EndTime = "23:59"
};
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_WithinSameDayQuietHours_ReturnsSuppressed()
{
// Arrange - set time to 14:00 (2 PM)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 14, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "12:00",
EndTime = "18:00"
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.True(result.IsSuppressed);
Assert.Equal("quiet_hours", result.SuppressionType);
Assert.Contains("Quiet hours", result.Reason);
}
[Fact]
public async Task EvaluateAsync_OutsideSameDayQuietHours_ReturnsNotSuppressed()
{
// Arrange - set time to 10:00 (10 AM)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 10, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "12:00",
EndTime = "18:00"
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_WithinOvernightQuietHours_Morning_ReturnsSuppressed()
{
// Arrange - set time to 06:00 (6 AM)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 6, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "22:00",
EndTime = "08:00"
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.True(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_WithinOvernightQuietHours_Evening_ReturnsSuppressed()
{
// Arrange - set time to 23:00 (11 PM)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 23, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "22:00",
EndTime = "08:00"
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.True(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_OutsideOvernightQuietHours_ReturnsNotSuppressed()
{
// Arrange - set time to 12:00 (noon)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 12, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "22:00",
EndTime = "08:00"
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_DayOfWeekFilter_AppliesCorrectly()
{
// Arrange - Wednesday (day 3)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 14, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "00:00",
EndTime = "23:59",
DaysOfWeek = [0, 6] // Sunday, Saturday only
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert - Wednesday is not in the list
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_DayOfWeekIncluded_ReturnsSuppressed()
{
// Arrange - Wednesday (day 3)
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 14, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "00:00",
EndTime = "23:59",
DaysOfWeek = [3] // Wednesday
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.True(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_ExcludedEventKind_ReturnsNotSuppressed()
{
// Arrange
_timeProvider.SetUtcNow(new DateTimeOffset(2024, 1, 10, 14, 0, 0, TimeSpan.Zero));
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "00:00",
EndTime = "23:59",
ExcludedEventKinds = ["security", "critical"]
};
var evaluator = CreateEvaluator();
// Act
var result = await evaluator.EvaluateAsync("tenant1", "security.alert");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_MaintenanceWindow_Active_ReturnsSuppressed()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now.AddHours(-1),
EndTime = now.AddHours(1),
Description = "Scheduled maintenance"
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.True(result.IsSuppressed);
Assert.Equal("maintenance", result.SuppressionType);
Assert.Contains("Scheduled maintenance", result.Reason);
}
[Fact]
public async Task EvaluateAsync_MaintenanceWindow_NotActive_ReturnsNotSuppressed()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now.AddHours(1),
EndTime = now.AddHours(2),
Description = "Future maintenance"
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_MaintenanceWindow_DifferentTenant_ReturnsNotSuppressed()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now.AddHours(-1),
EndTime = now.AddHours(1)
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
var result = await _evaluator.EvaluateAsync("tenant2", "test.event");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_MaintenanceWindow_AffectedEventKind_ReturnsSuppressed()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now.AddHours(-1),
EndTime = now.AddHours(1),
AffectedEventKinds = ["scanner", "monitor"]
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "scanner.complete");
// Assert
Assert.True(result.IsSuppressed);
}
[Fact]
public async Task EvaluateAsync_MaintenanceWindow_UnaffectedEventKind_ReturnsNotSuppressed()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now.AddHours(-1),
EndTime = now.AddHours(1),
AffectedEventKinds = ["scanner", "monitor"]
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
var result = await _evaluator.EvaluateAsync("tenant1", "security.alert");
// Assert
Assert.False(result.IsSuppressed);
}
[Fact]
public async Task AddMaintenanceWindowAsync_AddsWindow()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now,
EndTime = now.AddHours(2)
};
// Act
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Assert
var windows = await _evaluator.ListMaintenanceWindowsAsync("tenant1");
Assert.Single(windows);
Assert.Equal("maint-1", windows[0].WindowId);
}
[Fact]
public async Task RemoveMaintenanceWindowAsync_RemovesWindow()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now,
EndTime = now.AddHours(2)
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
await _evaluator.RemoveMaintenanceWindowAsync("tenant1", "maint-1");
// Assert
var windows = await _evaluator.ListMaintenanceWindowsAsync("tenant1");
Assert.Empty(windows);
}
[Fact]
public async Task ListMaintenanceWindowsAsync_ExcludesExpiredWindows()
{
// Arrange
var now = _timeProvider.GetUtcNow();
var activeWindow = new MaintenanceWindow
{
WindowId = "maint-active",
TenantId = "tenant1",
StartTime = now.AddHours(-1),
EndTime = now.AddHours(1)
};
var expiredWindow = new MaintenanceWindow
{
WindowId = "maint-expired",
TenantId = "tenant1",
StartTime = now.AddHours(-3),
EndTime = now.AddHours(-1)
};
await _evaluator.AddMaintenanceWindowAsync("tenant1", activeWindow);
await _evaluator.AddMaintenanceWindowAsync("tenant1", expiredWindow);
// Act
var windows = await _evaluator.ListMaintenanceWindowsAsync("tenant1");
// Assert
Assert.Single(windows);
Assert.Equal("maint-active", windows[0].WindowId);
}
[Fact]
public async Task EvaluateAsync_MaintenanceHasPriorityOverQuietHours()
{
// Arrange - setup both maintenance and quiet hours
var now = _timeProvider.GetUtcNow();
_options.Schedule = new QuietHoursSchedule
{
Enabled = true,
StartTime = "00:00",
EndTime = "23:59"
};
var evaluator = CreateEvaluator();
var window = new MaintenanceWindow
{
WindowId = "maint-1",
TenantId = "tenant1",
StartTime = now.AddHours(-1),
EndTime = now.AddHours(1),
Description = "System upgrade"
};
await evaluator.AddMaintenanceWindowAsync("tenant1", window);
// Act
var result = await evaluator.EvaluateAsync("tenant1", "test.event");
// Assert - maintenance should take priority
Assert.True(result.IsSuppressed);
Assert.Equal("maintenance", result.SuppressionType);
}
}

View File

@@ -0,0 +1,254 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class SuppressionAuditLoggerTests
{
private readonly SuppressionAuditOptions _options;
private readonly InMemorySuppressionAuditLogger _logger;
public SuppressionAuditLoggerTests()
{
_options = new SuppressionAuditOptions
{
MaxEntriesPerTenant = 100
};
_logger = new InMemorySuppressionAuditLogger(
Options.Create(_options),
NullLogger<InMemorySuppressionAuditLogger>.Instance);
}
[Fact]
public async Task LogAsync_StoresEntry()
{
// Arrange
var entry = CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated);
// Act
await _logger.LogAsync(entry);
// Assert
var results = await _logger.QueryAsync(new SuppressionAuditQuery { TenantId = "tenant1" });
Assert.Single(results);
Assert.Equal(entry.EntryId, results[0].EntryId);
}
[Fact]
public async Task QueryAsync_ReturnsEmptyForUnknownTenant()
{
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery { TenantId = "nonexistent" });
// Assert
Assert.Empty(results);
}
[Fact]
public async Task QueryAsync_FiltersByTimeRange()
{
// Arrange
var now = DateTimeOffset.UtcNow;
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated, now.AddHours(-3)));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarUpdated, now.AddHours(-1)));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarDeleted, now));
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
From = now.AddHours(-2),
To = now.AddMinutes(-30)
});
// Assert
Assert.Single(results);
Assert.Equal(SuppressionAuditAction.CalendarUpdated, results[0].Action);
}
[Fact]
public async Task QueryAsync_FiltersByAction()
{
// Arrange
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarUpdated));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.ThrottleConfigUpdated));
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
Actions = [SuppressionAuditAction.CalendarCreated, SuppressionAuditAction.CalendarUpdated]
});
// Assert
Assert.Equal(2, results.Count);
Assert.DoesNotContain(results, r => r.Action == SuppressionAuditAction.ThrottleConfigUpdated);
}
[Fact]
public async Task QueryAsync_FiltersByActor()
{
// Arrange
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated, actor: "admin1"));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarUpdated, actor: "admin2"));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarDeleted, actor: "admin1"));
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
Actor = "admin1"
});
// Assert
Assert.Equal(2, results.Count);
Assert.All(results, r => Assert.Equal("admin1", r.Actor));
}
[Fact]
public async Task QueryAsync_FiltersByResourceType()
{
// Arrange
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated, resourceType: "QuietHourCalendar"));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.ThrottleConfigUpdated, resourceType: "TenantThrottleConfig"));
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
ResourceType = "QuietHourCalendar"
});
// Assert
Assert.Single(results);
Assert.Equal("QuietHourCalendar", results[0].ResourceType);
}
[Fact]
public async Task QueryAsync_FiltersByResourceId()
{
// Arrange
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated, resourceId: "cal-123"));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarUpdated, resourceId: "cal-456"));
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
ResourceId = "cal-123"
});
// Assert
Assert.Single(results);
Assert.Equal("cal-123", results[0].ResourceId);
}
[Fact]
public async Task QueryAsync_AppliesPagination()
{
// Arrange
var now = DateTimeOffset.UtcNow;
for (int i = 0; i < 10; i++)
{
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated, now.AddMinutes(-i)));
}
// Act
var firstPage = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
Limit = 3,
Offset = 0
});
var secondPage = await _logger.QueryAsync(new SuppressionAuditQuery
{
TenantId = "tenant1",
Limit = 3,
Offset = 3
});
// Assert
Assert.Equal(3, firstPage.Count);
Assert.Equal(3, secondPage.Count);
Assert.NotEqual(firstPage[0].EntryId, secondPage[0].EntryId);
}
[Fact]
public async Task QueryAsync_OrdersByTimestampDescending()
{
// Arrange
var now = DateTimeOffset.UtcNow;
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated, now.AddHours(-2)));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarUpdated, now.AddHours(-1)));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarDeleted, now));
// Act
var results = await _logger.QueryAsync(new SuppressionAuditQuery { TenantId = "tenant1" });
// Assert
Assert.Equal(3, results.Count);
Assert.True(results[0].Timestamp > results[1].Timestamp);
Assert.True(results[1].Timestamp > results[2].Timestamp);
}
[Fact]
public async Task LogAsync_TrimsOldEntriesWhenLimitExceeded()
{
// Arrange
var options = new SuppressionAuditOptions { MaxEntriesPerTenant = 5 };
var logger = new InMemorySuppressionAuditLogger(
Options.Create(options),
NullLogger<InMemorySuppressionAuditLogger>.Instance);
// Act - Add more entries than the limit
for (int i = 0; i < 10; i++)
{
await logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated));
}
// Assert
var results = await logger.QueryAsync(new SuppressionAuditQuery { TenantId = "tenant1" });
Assert.Equal(5, results.Count);
}
[Fact]
public async Task LogAsync_IsolatesTenantsCorrectly()
{
// Arrange
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarCreated));
await _logger.LogAsync(CreateEntry("tenant2", SuppressionAuditAction.CalendarUpdated));
await _logger.LogAsync(CreateEntry("tenant1", SuppressionAuditAction.CalendarDeleted));
// Act
var tenant1Results = await _logger.QueryAsync(new SuppressionAuditQuery { TenantId = "tenant1" });
var tenant2Results = await _logger.QueryAsync(new SuppressionAuditQuery { TenantId = "tenant2" });
// Assert
Assert.Equal(2, tenant1Results.Count);
Assert.Single(tenant2Results);
}
private static SuppressionAuditEntry CreateEntry(
string tenantId,
SuppressionAuditAction action,
DateTimeOffset? timestamp = null,
string actor = "system",
string resourceType = "TestResource",
string resourceId = "test-123")
{
return new SuppressionAuditEntry
{
EntryId = Guid.NewGuid().ToString("N")[..16],
TenantId = tenantId,
Timestamp = timestamp ?? DateTimeOffset.UtcNow,
Actor = actor,
Action = action,
ResourceType = resourceType,
ResourceId = resourceId
};
}
}

View File

@@ -0,0 +1,330 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class ThrottleConfigServiceTests
{
private readonly Mock<ISuppressionAuditLogger> _auditLogger;
private readonly FakeTimeProvider _timeProvider;
private readonly ThrottlerOptions _globalOptions;
private readonly InMemoryThrottleConfigService _service;
public ThrottleConfigServiceTests()
{
_auditLogger = new Mock<ISuppressionAuditLogger>();
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 14, 0, 0, TimeSpan.Zero));
_globalOptions = new ThrottlerOptions
{
Enabled = true,
DefaultWindow = TimeSpan.FromMinutes(5),
DefaultMaxEvents = 10
};
_service = new InMemoryThrottleConfigService(
_auditLogger.Object,
Options.Create(_globalOptions),
_timeProvider,
NullLogger<InMemoryThrottleConfigService>.Instance);
}
[Fact]
public async Task GetEffectiveConfigAsync_ReturnsGlobalDefaultsWhenNoTenantConfig()
{
// Act
var config = await _service.GetEffectiveConfigAsync("tenant1", "vulnerability.found");
// Assert
Assert.True(config.Enabled);
Assert.Equal(TimeSpan.FromMinutes(5), config.Window);
Assert.Equal(10, config.MaxEvents);
Assert.Equal("global", config.Source);
}
[Fact]
public async Task SetTenantConfigAsync_CreatesTenantConfig()
{
// Arrange
var update = new TenantThrottleConfigUpdate
{
Enabled = true,
DefaultWindow = TimeSpan.FromMinutes(10),
DefaultMaxEvents = 20
};
// Act
var config = await _service.SetTenantConfigAsync("tenant1", update, "admin");
// Assert
Assert.Equal("tenant1", config.TenantId);
Assert.True(config.Enabled);
Assert.Equal(TimeSpan.FromMinutes(10), config.DefaultWindow);
Assert.Equal(20, config.DefaultMaxEvents);
Assert.Equal("admin", config.UpdatedBy);
}
[Fact]
public async Task SetTenantConfigAsync_LogsAuditEntry()
{
// Arrange
var update = new TenantThrottleConfigUpdate { DefaultMaxEvents = 50 };
// Act
await _service.SetTenantConfigAsync("tenant1", update, "admin");
// Assert
_auditLogger.Verify(a => a.LogAsync(
It.Is<SuppressionAuditEntry>(e =>
e.Action == SuppressionAuditAction.ThrottleConfigUpdated &&
e.ResourceType == "TenantThrottleConfig" &&
e.Actor == "admin"),
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task GetEffectiveConfigAsync_UsesTenantConfigWhenSet()
{
// Arrange
await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate
{
DefaultWindow = TimeSpan.FromMinutes(15),
DefaultMaxEvents = 25
}, "admin");
// Act
var config = await _service.GetEffectiveConfigAsync("tenant1", "event.test");
// Assert
Assert.Equal(TimeSpan.FromMinutes(15), config.Window);
Assert.Equal(25, config.MaxEvents);
Assert.Equal("tenant", config.Source);
}
[Fact]
public async Task SetEventKindConfigAsync_CreatesEventKindOverride()
{
// Arrange
var update = new EventKindThrottleConfigUpdate
{
Window = TimeSpan.FromMinutes(1),
MaxEvents = 5
};
// Act
var config = await _service.SetEventKindConfigAsync("tenant1", "critical.*", update, "admin");
// Assert
Assert.Equal("tenant1", config.TenantId);
Assert.Equal("critical.*", config.EventKindPattern);
Assert.Equal(TimeSpan.FromMinutes(1), config.Window);
Assert.Equal(5, config.MaxEvents);
}
[Fact]
public async Task GetEffectiveConfigAsync_UsesEventKindOverrideWhenMatches()
{
// Arrange
await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate
{
DefaultWindow = TimeSpan.FromMinutes(10),
DefaultMaxEvents = 20
}, "admin");
await _service.SetEventKindConfigAsync("tenant1", "critical.*", new EventKindThrottleConfigUpdate
{
Window = TimeSpan.FromMinutes(1),
MaxEvents = 100
}, "admin");
// Act
var criticalConfig = await _service.GetEffectiveConfigAsync("tenant1", "critical.security.breach");
var normalConfig = await _service.GetEffectiveConfigAsync("tenant1", "info.scan.complete");
// Assert
Assert.Equal("event_kind", criticalConfig.Source);
Assert.Equal(TimeSpan.FromMinutes(1), criticalConfig.Window);
Assert.Equal(100, criticalConfig.MaxEvents);
Assert.Equal("critical.*", criticalConfig.MatchedPattern);
Assert.Equal("tenant", normalConfig.Source);
Assert.Equal(TimeSpan.FromMinutes(10), normalConfig.Window);
Assert.Equal(20, normalConfig.MaxEvents);
}
[Fact]
public async Task GetEffectiveConfigAsync_UsesMoreSpecificPatternFirst()
{
// Arrange
await _service.SetEventKindConfigAsync("tenant1", "vulnerability.*", new EventKindThrottleConfigUpdate
{
MaxEvents = 10,
Priority = 100
}, "admin");
await _service.SetEventKindConfigAsync("tenant1", "vulnerability.critical.*", new EventKindThrottleConfigUpdate
{
MaxEvents = 5,
Priority = 50 // Higher priority (lower number)
}, "admin");
// Act
var specificConfig = await _service.GetEffectiveConfigAsync("tenant1", "vulnerability.critical.cve123");
var generalConfig = await _service.GetEffectiveConfigAsync("tenant1", "vulnerability.low.cve456");
// Assert
Assert.Equal(5, specificConfig.MaxEvents);
Assert.Equal("vulnerability.critical.*", specificConfig.MatchedPattern);
Assert.Equal(10, generalConfig.MaxEvents);
Assert.Equal("vulnerability.*", generalConfig.MatchedPattern);
}
[Fact]
public async Task GetEffectiveConfigAsync_DisabledEventKindDisablesThrottling()
{
// Arrange
await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate
{
Enabled = true,
DefaultMaxEvents = 20
}, "admin");
await _service.SetEventKindConfigAsync("tenant1", "info.*", new EventKindThrottleConfigUpdate
{
Enabled = false
}, "admin");
// Act
var config = await _service.GetEffectiveConfigAsync("tenant1", "info.log");
// Assert
Assert.False(config.Enabled);
Assert.Equal("event_kind", config.Source);
}
[Fact]
public async Task ListEventKindConfigsAsync_ReturnsAllConfigsForTenant()
{
// Arrange
await _service.SetEventKindConfigAsync("tenant1", "critical.*", new EventKindThrottleConfigUpdate { MaxEvents = 5, Priority = 10 }, "admin");
await _service.SetEventKindConfigAsync("tenant1", "info.*", new EventKindThrottleConfigUpdate { MaxEvents = 100, Priority = 100 }, "admin");
await _service.SetEventKindConfigAsync("tenant2", "other.*", new EventKindThrottleConfigUpdate { MaxEvents = 50 }, "admin");
// Act
var configs = await _service.ListEventKindConfigsAsync("tenant1");
// Assert
Assert.Equal(2, configs.Count);
Assert.Equal("critical.*", configs[0].EventKindPattern); // Lower priority first
Assert.Equal("info.*", configs[1].EventKindPattern);
}
[Fact]
public async Task RemoveEventKindConfigAsync_RemovesConfig()
{
// Arrange
await _service.SetEventKindConfigAsync("tenant1", "test.*", new EventKindThrottleConfigUpdate { MaxEvents = 5 }, "admin");
// Act
var removed = await _service.RemoveEventKindConfigAsync("tenant1", "test.*", "admin");
// Assert
Assert.True(removed);
var configs = await _service.ListEventKindConfigsAsync("tenant1");
Assert.Empty(configs);
}
[Fact]
public async Task RemoveEventKindConfigAsync_LogsAuditEntry()
{
// Arrange
await _service.SetEventKindConfigAsync("tenant1", "test.*", new EventKindThrottleConfigUpdate { MaxEvents = 5 }, "admin");
// Act
await _service.RemoveEventKindConfigAsync("tenant1", "test.*", "admin");
// Assert
_auditLogger.Verify(a => a.LogAsync(
It.Is<SuppressionAuditEntry>(e =>
e.Action == SuppressionAuditAction.ThrottleConfigDeleted &&
e.ResourceId == "test.*"),
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task GetTenantConfigAsync_ReturnsNullWhenNotSet()
{
// Act
var config = await _service.GetTenantConfigAsync("nonexistent");
// Assert
Assert.Null(config);
}
[Fact]
public async Task GetTenantConfigAsync_ReturnsConfigWhenSet()
{
// Arrange
await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate { DefaultMaxEvents = 50 }, "admin");
// Act
var config = await _service.GetTenantConfigAsync("tenant1");
// Assert
Assert.NotNull(config);
Assert.Equal(50, config.DefaultMaxEvents);
}
[Fact]
public async Task SetTenantConfigAsync_UpdatesExistingConfig()
{
// Arrange
await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate { DefaultMaxEvents = 10 }, "admin1");
// Act
var updated = await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate { DefaultMaxEvents = 20 }, "admin2");
// Assert
Assert.Equal(20, updated.DefaultMaxEvents);
Assert.Equal("admin2", updated.UpdatedBy);
}
[Fact]
public async Task GetEffectiveConfigAsync_IncludesBurstAllowanceAndCooldown()
{
// Arrange
await _service.SetTenantConfigAsync("tenant1", new TenantThrottleConfigUpdate
{
BurstAllowance = 5,
CooldownPeriod = TimeSpan.FromMinutes(10)
}, "admin");
// Act
var config = await _service.GetEffectiveConfigAsync("tenant1", "event.test");
// Assert
Assert.Equal(5, config.BurstAllowance);
Assert.Equal(TimeSpan.FromMinutes(10), config.CooldownPeriod);
}
[Fact]
public async Task GetEffectiveConfigAsync_WildcardPatternMatchesAllEvents()
{
// Arrange
await _service.SetEventKindConfigAsync("tenant1", "*", new EventKindThrottleConfigUpdate
{
MaxEvents = 1000,
Priority = 1000 // Very low priority
}, "admin");
// Act
var config = await _service.GetEffectiveConfigAsync("tenant1", "any.event.kind.here");
// Assert
Assert.Equal(1000, config.MaxEvents);
Assert.Equal("*", config.MatchedPattern);
}
}

View File

@@ -0,0 +1,291 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notify.Storage.Mongo.Repositories;
using StellaOps.Notifier.Worker.Correlation;
namespace StellaOps.Notifier.Tests.Correlation;
public class ThrottleConfigurationServiceTests
{
private readonly Mock<INotifyAuditRepository> _auditRepository;
private readonly FakeTimeProvider _timeProvider;
private readonly InMemoryThrottleConfigurationService _service;
public ThrottleConfigurationServiceTests()
{
_auditRepository = new Mock<INotifyAuditRepository>();
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 10, 0, 0, TimeSpan.Zero));
_service = new InMemoryThrottleConfigurationService(
_auditRepository.Object,
_timeProvider,
NullLogger<InMemoryThrottleConfigurationService>.Instance);
}
[Fact]
public async Task GetConfigurationAsync_NoConfiguration_ReturnsNull()
{
// Act
var result = await _service.GetConfigurationAsync("tenant1");
// Assert
Assert.Null(result);
}
[Fact]
public async Task UpsertConfigurationAsync_NewConfiguration_CreatesConfiguration()
{
// Arrange
var config = CreateTestConfiguration("tenant1");
// Act
var result = await _service.UpsertConfigurationAsync(config, "admin");
// Assert
Assert.Equal("tenant1", result.TenantId);
Assert.Equal(TimeSpan.FromMinutes(30), result.DefaultDuration);
Assert.Equal(_timeProvider.GetUtcNow(), result.CreatedAt);
Assert.Equal("admin", result.CreatedBy);
}
[Fact]
public async Task UpsertConfigurationAsync_ExistingConfiguration_UpdatesConfiguration()
{
// Arrange
var config = CreateTestConfiguration("tenant1");
await _service.UpsertConfigurationAsync(config, "admin");
_timeProvider.Advance(TimeSpan.FromMinutes(5));
var updated = config with { DefaultDuration = TimeSpan.FromMinutes(60) };
// Act
var result = await _service.UpsertConfigurationAsync(updated, "admin2");
// Assert
Assert.Equal(TimeSpan.FromMinutes(60), result.DefaultDuration);
Assert.Equal("admin", result.CreatedBy); // Original creator preserved
Assert.Equal("admin2", result.UpdatedBy);
}
[Fact]
public async Task DeleteConfigurationAsync_ExistingConfiguration_ReturnsTrue()
{
// Arrange
var config = CreateTestConfiguration("tenant1");
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.DeleteConfigurationAsync("tenant1", "admin");
// Assert
Assert.True(result);
Assert.Null(await _service.GetConfigurationAsync("tenant1"));
}
[Fact]
public async Task DeleteConfigurationAsync_NonExistentConfiguration_ReturnsFalse()
{
// Act
var result = await _service.DeleteConfigurationAsync("tenant1", "admin");
// Assert
Assert.False(result);
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_NoConfiguration_ReturnsDefault()
{
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "event.test");
// Assert
Assert.Equal(TimeSpan.FromMinutes(15), result); // Default
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_WithConfiguration_ReturnsConfiguredDuration()
{
// Arrange
var config = CreateTestConfiguration("tenant1") with
{
DefaultDuration = TimeSpan.FromMinutes(45)
};
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "event.test");
// Assert
Assert.Equal(TimeSpan.FromMinutes(45), result);
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_DisabledConfiguration_ReturnsDefault()
{
// Arrange
var config = CreateTestConfiguration("tenant1") with
{
DefaultDuration = TimeSpan.FromMinutes(45),
Enabled = false
};
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "event.test");
// Assert
Assert.Equal(TimeSpan.FromMinutes(15), result); // Default when disabled
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_WithExactMatchOverride_ReturnsOverride()
{
// Arrange
var config = CreateTestConfiguration("tenant1") with
{
DefaultDuration = TimeSpan.FromMinutes(30),
EventKindOverrides = new Dictionary<string, TimeSpan>
{
["critical.alert"] = TimeSpan.FromMinutes(5)
}
};
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "critical.alert");
// Assert
Assert.Equal(TimeSpan.FromMinutes(5), result);
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_WithPrefixMatchOverride_ReturnsOverride()
{
// Arrange
var config = CreateTestConfiguration("tenant1") with
{
DefaultDuration = TimeSpan.FromMinutes(30),
EventKindOverrides = new Dictionary<string, TimeSpan>
{
["critical."] = TimeSpan.FromMinutes(5)
}
};
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "critical.alert.high");
// Assert
Assert.Equal(TimeSpan.FromMinutes(5), result);
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_WithMultipleOverrides_ReturnsLongestPrefixMatch()
{
// Arrange
var config = CreateTestConfiguration("tenant1") with
{
DefaultDuration = TimeSpan.FromMinutes(30),
EventKindOverrides = new Dictionary<string, TimeSpan>
{
["critical."] = TimeSpan.FromMinutes(5),
["critical.alert."] = TimeSpan.FromMinutes(2)
}
};
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "critical.alert.security");
// Assert - Should match the more specific override
Assert.Equal(TimeSpan.FromMinutes(2), result);
}
[Fact]
public async Task GetEffectiveThrottleDurationAsync_NoMatchingOverride_ReturnsDefault()
{
// Arrange
var config = CreateTestConfiguration("tenant1") with
{
DefaultDuration = TimeSpan.FromMinutes(30),
EventKindOverrides = new Dictionary<string, TimeSpan>
{
["critical."] = TimeSpan.FromMinutes(5)
}
};
await _service.UpsertConfigurationAsync(config, "admin");
// Act
var result = await _service.GetEffectiveThrottleDurationAsync("tenant1", "info.status");
// Assert
Assert.Equal(TimeSpan.FromMinutes(30), result);
}
[Fact]
public async Task UpsertConfigurationAsync_AuditsCreation()
{
// Arrange
var config = CreateTestConfiguration("tenant1");
// Act
await _service.UpsertConfigurationAsync(config, "admin");
// Assert
_auditRepository.Verify(a => a.AppendAsync(
"tenant1",
"throttle_config_created",
It.IsAny<Dictionary<string, string>>(),
"admin",
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task UpsertConfigurationAsync_AuditsUpdate()
{
// Arrange
var config = CreateTestConfiguration("tenant1");
await _service.UpsertConfigurationAsync(config, "admin");
_auditRepository.Invocations.Clear();
// Act
await _service.UpsertConfigurationAsync(config with { DefaultDuration = TimeSpan.FromHours(1) }, "admin2");
// Assert
_auditRepository.Verify(a => a.AppendAsync(
"tenant1",
"throttle_config_updated",
It.IsAny<Dictionary<string, string>>(),
"admin2",
It.IsAny<CancellationToken>()), Times.Once);
}
[Fact]
public async Task DeleteConfigurationAsync_AuditsDeletion()
{
// Arrange
var config = CreateTestConfiguration("tenant1");
await _service.UpsertConfigurationAsync(config, "admin");
_auditRepository.Invocations.Clear();
// Act
await _service.DeleteConfigurationAsync("tenant1", "admin");
// Assert
_auditRepository.Verify(a => a.AppendAsync(
"tenant1",
"throttle_config_deleted",
It.IsAny<Dictionary<string, string>>(),
"admin",
It.IsAny<CancellationToken>()), Times.Once);
}
private static ThrottleConfiguration CreateTestConfiguration(string tenantId) => new()
{
TenantId = tenantId,
DefaultDuration = TimeSpan.FromMinutes(30),
Enabled = true
};
}

View File

@@ -0,0 +1,296 @@
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using StellaOps.Notifier.Worker.Correlation;
using StellaOps.Notifier.Worker.Digest;
using Xunit;
namespace StellaOps.Notifier.Tests.Digest;
public sealed class DigestGeneratorTests
{
private readonly InMemoryIncidentManager _incidentManager;
private readonly DigestGenerator _generator;
private readonly FakeTimeProvider _timeProvider;
public DigestGeneratorTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.Parse("2025-11-27T12:00:00Z"));
var incidentOptions = Options.Create(new IncidentManagerOptions
{
CorrelationWindow = TimeSpan.FromHours(1),
ReopenOnNewEvent = true
});
_incidentManager = new InMemoryIncidentManager(
incidentOptions,
_timeProvider,
new NullLogger<InMemoryIncidentManager>());
var digestOptions = Options.Create(new DigestOptions
{
MaxIncidentsPerDigest = 50,
TopAffectedCount = 5,
RenderContent = true,
RenderSlackBlocks = true,
SkipEmptyDigests = true
});
_generator = new DigestGenerator(
_incidentManager,
digestOptions,
_timeProvider,
new NullLogger<DigestGenerator>());
}
[Fact]
public async Task GenerateAsync_EmptyTenant_ReturnsEmptyDigest()
{
// Arrange
var query = DigestQuery.LastHours(24, _timeProvider.GetUtcNow());
// Act
var result = await _generator.GenerateAsync("tenant-1", query);
// Assert
Assert.NotNull(result);
Assert.Equal("tenant-1", result.TenantId);
Assert.Empty(result.Incidents);
Assert.Equal(0, result.Summary.TotalEvents);
Assert.Equal(0, result.Summary.NewIncidents);
Assert.False(result.Summary.HasActivity);
}
[Fact]
public async Task GenerateAsync_WithIncidents_ReturnsSummary()
{
// Arrange
var incident = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "vuln:critical:pkg-foo", "vulnerability.detected", "Critical vulnerability in pkg-foo");
await _incidentManager.RecordEventAsync("tenant-1", incident.IncidentId, "evt-1");
await _incidentManager.RecordEventAsync("tenant-1", incident.IncidentId, "evt-2");
var query = DigestQuery.LastHours(24, _timeProvider.GetUtcNow());
// Act
var result = await _generator.GenerateAsync("tenant-1", query);
// Assert
Assert.Single(result.Incidents);
Assert.Equal(2, result.Summary.TotalEvents);
Assert.Equal(1, result.Summary.NewIncidents);
Assert.Equal(1, result.Summary.OpenIncidents);
Assert.True(result.Summary.HasActivity);
}
[Fact]
public async Task GenerateAsync_MultipleIncidents_GroupsByEventKind()
{
// Arrange
var inc1 = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key1", "vulnerability.detected", "Vuln 1");
await _incidentManager.RecordEventAsync("tenant-1", inc1.IncidentId, "evt-1");
var inc2 = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key2", "vulnerability.detected", "Vuln 2");
await _incidentManager.RecordEventAsync("tenant-1", inc2.IncidentId, "evt-2");
var inc3 = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key3", "pack.approval.required", "Approval needed");
await _incidentManager.RecordEventAsync("tenant-1", inc3.IncidentId, "evt-3");
var query = DigestQuery.LastHours(24, _timeProvider.GetUtcNow());
// Act
var result = await _generator.GenerateAsync("tenant-1", query);
// Assert
Assert.Equal(3, result.Incidents.Count);
Assert.Equal(3, result.Summary.TotalEvents);
Assert.Contains("vulnerability.detected", result.Summary.ByEventKind.Keys);
Assert.Contains("pack.approval.required", result.Summary.ByEventKind.Keys);
Assert.Equal(2, result.Summary.ByEventKind["vulnerability.detected"]);
Assert.Equal(1, result.Summary.ByEventKind["pack.approval.required"]);
}
[Fact]
public async Task GenerateAsync_RendersContent()
{
// Arrange
var incident = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key", "vulnerability.detected", "Critical issue");
await _incidentManager.RecordEventAsync("tenant-1", incident.IncidentId, "evt-1");
var query = DigestQuery.LastHours(24, _timeProvider.GetUtcNow());
// Act
var result = await _generator.GenerateAsync("tenant-1", query);
// Assert
Assert.NotNull(result.Content);
Assert.NotEmpty(result.Content.PlainText!);
Assert.NotEmpty(result.Content.Markdown!);
Assert.NotEmpty(result.Content.Html!);
Assert.NotEmpty(result.Content.Json!);
Assert.NotEmpty(result.Content.SlackBlocks!);
Assert.Contains("Notification Digest", result.Content.PlainText);
Assert.Contains("tenant-1", result.Content.PlainText);
Assert.Contains("Critical issue", result.Content.PlainText);
}
[Fact]
public async Task GenerateAsync_RespectsMaxIncidents()
{
// Arrange
for (var i = 0; i < 10; i++)
{
var inc = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", $"key-{i}", "test.event", $"Test incident {i}");
await _incidentManager.RecordEventAsync("tenant-1", inc.IncidentId, $"evt-{i}");
}
var query = new DigestQuery
{
From = _timeProvider.GetUtcNow().AddDays(-1),
To = _timeProvider.GetUtcNow(),
MaxIncidents = 5
};
// Act
var result = await _generator.GenerateAsync("tenant-1", query);
// Assert
Assert.Equal(5, result.Incidents.Count);
Assert.Equal(10, result.TotalIncidentCount);
Assert.True(result.HasMore);
}
[Fact]
public async Task GenerateAsync_FiltersResolvedIncidents()
{
// Arrange
var openInc = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key-open", "test.event", "Open incident");
await _incidentManager.RecordEventAsync("tenant-1", openInc.IncidentId, "evt-1");
var resolvedInc = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key-resolved", "test.event", "Resolved incident");
await _incidentManager.RecordEventAsync("tenant-1", resolvedInc.IncidentId, "evt-2");
await _incidentManager.ResolveAsync("tenant-1", resolvedInc.IncidentId, "system", "Auto-resolved");
var queryExcludeResolved = new DigestQuery
{
From = _timeProvider.GetUtcNow().AddDays(-1),
To = _timeProvider.GetUtcNow(),
IncludeResolved = false
};
var queryIncludeResolved = new DigestQuery
{
From = _timeProvider.GetUtcNow().AddDays(-1),
To = _timeProvider.GetUtcNow(),
IncludeResolved = true
};
// Act
var resultExclude = await _generator.GenerateAsync("tenant-1", queryExcludeResolved);
var resultInclude = await _generator.GenerateAsync("tenant-1", queryIncludeResolved);
// Assert
Assert.Single(resultExclude.Incidents);
Assert.Equal("Open incident", resultExclude.Incidents[0].Title);
Assert.Equal(2, resultInclude.Incidents.Count);
}
[Fact]
public async Task GenerateAsync_FiltersEventKinds()
{
// Arrange
var vulnInc = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key-vuln", "vulnerability.detected", "Vulnerability");
await _incidentManager.RecordEventAsync("tenant-1", vulnInc.IncidentId, "evt-1");
var approvalInc = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key-approval", "pack.approval.required", "Approval");
await _incidentManager.RecordEventAsync("tenant-1", approvalInc.IncidentId, "evt-2");
var query = new DigestQuery
{
From = _timeProvider.GetUtcNow().AddDays(-1),
To = _timeProvider.GetUtcNow(),
EventKinds = ["vulnerability.detected"]
};
// Act
var result = await _generator.GenerateAsync("tenant-1", query);
// Assert
Assert.Single(result.Incidents);
Assert.Equal("vulnerability.detected", result.Incidents[0].EventKind);
}
[Fact]
public async Task PreviewAsync_SetsIsPreviewFlag()
{
// Arrange
var incident = await _incidentManager.GetOrCreateIncidentAsync(
"tenant-1", "key", "test.event", "Test");
await _incidentManager.RecordEventAsync("tenant-1", incident.IncidentId, "evt-1");
var query = DigestQuery.LastHours(24, _timeProvider.GetUtcNow());
// Act
var result = await _generator.PreviewAsync("tenant-1", query);
// Assert
Assert.True(result.IsPreview);
}
[Fact]
public void DigestQuery_LastHours_CalculatesCorrectWindow()
{
// Arrange
var asOf = DateTimeOffset.Parse("2025-11-27T12:00:00Z");
// Act
var query = DigestQuery.LastHours(6, asOf);
// Assert
Assert.Equal(DateTimeOffset.Parse("2025-11-27T06:00:00Z"), query.From);
Assert.Equal(asOf, query.To);
}
[Fact]
public void DigestQuery_LastDays_CalculatesCorrectWindow()
{
// Arrange
var asOf = DateTimeOffset.Parse("2025-11-27T12:00:00Z");
// Act
var query = DigestQuery.LastDays(7, asOf);
// Assert
Assert.Equal(DateTimeOffset.Parse("2025-11-20T12:00:00Z"), query.From);
Assert.Equal(asOf, query.To);
}
private sealed class FakeTimeProvider : TimeProvider
{
private DateTimeOffset _utcNow;
public FakeTimeProvider(DateTimeOffset utcNow) => _utcNow = utcNow;
public override DateTimeOffset GetUtcNow() => _utcNow;
public void Advance(TimeSpan duration) => _utcNow = _utcNow.Add(duration);
}
private sealed class NullLogger<T> : ILogger<T>
{
public IDisposable? BeginScope<TState>(TState state) where TState : notnull => null;
public bool IsEnabled(LogLevel logLevel) => false;
public void Log<TState>(LogLevel logLevel, EventId eventId, TState state, Exception? exception, Func<TState, Exception?, string> formatter) { }
}
}

View File

@@ -0,0 +1,250 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Digest;
namespace StellaOps.Notifier.Tests.Digest;
public class InMemoryDigestSchedulerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly InMemoryDigestScheduler _scheduler;
public InMemoryDigestSchedulerTests()
{
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 10, 0, 0, TimeSpan.Zero));
_scheduler = new InMemoryDigestScheduler(
_timeProvider,
NullLogger<InMemoryDigestScheduler>.Instance);
}
[Fact]
public async Task UpsertScheduleAsync_CreatesNewSchedule()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1");
// Act
var result = await _scheduler.UpsertScheduleAsync(schedule);
// Assert
Assert.NotNull(result);
Assert.Equal("schedule-1", result.ScheduleId);
Assert.NotNull(result.NextRunAt);
}
[Fact]
public async Task UpsertScheduleAsync_UpdatesExistingSchedule()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1");
await _scheduler.UpsertScheduleAsync(schedule);
var updated = schedule with { Name = "Updated Name" };
// Act
var result = await _scheduler.UpsertScheduleAsync(updated);
// Assert
Assert.Equal("Updated Name", result.Name);
}
[Fact]
public async Task GetScheduleAsync_ReturnsSchedule()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1");
await _scheduler.UpsertScheduleAsync(schedule);
// Act
var result = await _scheduler.GetScheduleAsync("tenant1", "schedule-1");
// Assert
Assert.NotNull(result);
Assert.Equal("schedule-1", result.ScheduleId);
}
[Fact]
public async Task GetScheduleAsync_ReturnsNullForUnknown()
{
// Act
var result = await _scheduler.GetScheduleAsync("tenant1", "unknown");
// Assert
Assert.Null(result);
}
[Fact]
public async Task GetSchedulesAsync_ReturnsTenantSchedules()
{
// Arrange
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-1", "tenant1"));
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-2", "tenant1"));
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-3", "tenant2"));
// Act
var result = await _scheduler.GetSchedulesAsync("tenant1");
// Assert
Assert.Equal(2, result.Count);
Assert.All(result, s => Assert.Equal("tenant1", s.TenantId));
}
[Fact]
public async Task DeleteScheduleAsync_RemovesSchedule()
{
// Arrange
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-1"));
// Act
var deleted = await _scheduler.DeleteScheduleAsync("tenant1", "schedule-1");
// Assert
Assert.True(deleted);
var result = await _scheduler.GetScheduleAsync("tenant1", "schedule-1");
Assert.Null(result);
}
[Fact]
public async Task DeleteScheduleAsync_ReturnsFalseForUnknown()
{
// Act
var deleted = await _scheduler.DeleteScheduleAsync("tenant1", "unknown");
// Assert
Assert.False(deleted);
}
[Fact]
public async Task GetDueSchedulesAsync_ReturnsDueSchedules()
{
// Arrange - create a schedule that should run every minute
var schedule = CreateTestSchedule("schedule-1") with
{
CronExpression = "0 * * * * *" // Every minute
};
await _scheduler.UpsertScheduleAsync(schedule);
// Advance time past next run
_timeProvider.Advance(TimeSpan.FromMinutes(2));
// Act
var dueSchedules = await _scheduler.GetDueSchedulesAsync(_timeProvider.GetUtcNow());
// Assert
Assert.Single(dueSchedules);
Assert.Equal("schedule-1", dueSchedules[0].ScheduleId);
}
[Fact]
public async Task GetDueSchedulesAsync_ExcludesDisabledSchedules()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1") with
{
Enabled = false,
CronExpression = "0 * * * * *"
};
await _scheduler.UpsertScheduleAsync(schedule);
_timeProvider.Advance(TimeSpan.FromMinutes(2));
// Act
var dueSchedules = await _scheduler.GetDueSchedulesAsync(_timeProvider.GetUtcNow());
// Assert
Assert.Empty(dueSchedules);
}
[Fact]
public async Task UpdateLastRunAsync_UpdatesTimestamps()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1") with
{
CronExpression = "0 0 * * * *" // Every hour
};
await _scheduler.UpsertScheduleAsync(schedule);
var runTime = _timeProvider.GetUtcNow();
// Act
await _scheduler.UpdateLastRunAsync("tenant1", "schedule-1", runTime);
// Assert
var updated = await _scheduler.GetScheduleAsync("tenant1", "schedule-1");
Assert.NotNull(updated);
Assert.Equal(runTime, updated.LastRunAt);
Assert.NotNull(updated.NextRunAt);
Assert.True(updated.NextRunAt > runTime);
}
[Fact]
public async Task UpsertScheduleAsync_CalculatesNextRunWithTimezone()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1") with
{
CronExpression = "0 0 9 * * *", // 9 AM every day
Timezone = "America/New_York"
};
// Act
var result = await _scheduler.UpsertScheduleAsync(schedule);
// Assert
Assert.NotNull(result.NextRunAt);
}
[Fact]
public async Task UpsertScheduleAsync_HandlesInvalidCron()
{
// Arrange
var schedule = CreateTestSchedule("schedule-1") with
{
CronExpression = "invalid-cron"
};
// Act
var result = await _scheduler.UpsertScheduleAsync(schedule);
// Assert
Assert.Null(result.NextRunAt);
}
[Fact]
public async Task GetSchedulesAsync_OrdersByName()
{
// Arrange
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-c") with { Name = "Charlie" });
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-a") with { Name = "Alpha" });
await _scheduler.UpsertScheduleAsync(CreateTestSchedule("schedule-b") with { Name = "Bravo" });
// Act
var result = await _scheduler.GetSchedulesAsync("tenant1");
// Assert
Assert.Equal(3, result.Count);
Assert.Equal("Alpha", result[0].Name);
Assert.Equal("Bravo", result[1].Name);
Assert.Equal("Charlie", result[2].Name);
}
private DigestSchedule CreateTestSchedule(string id, string tenantId = "tenant1")
{
return new DigestSchedule
{
ScheduleId = id,
TenantId = tenantId,
Name = $"Test Schedule {id}",
Enabled = true,
CronExpression = "0 0 8 * * *", // 8 AM daily
DigestType = DigestType.Daily,
Format = DigestFormat.Html,
CreatedAt = _timeProvider.GetUtcNow(),
Recipients =
[
new DigestRecipient { Type = "email", Address = "test@example.com" }
]
};
}
}

View File

@@ -0,0 +1,271 @@
using System.Text.Json.Nodes;
using Microsoft.Extensions.Logging.Abstractions;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Dispatch;
using Xunit;
namespace StellaOps.Notifier.Tests.Dispatch;
public sealed class SimpleTemplateRendererTests
{
private readonly SimpleTemplateRenderer _renderer;
public SimpleTemplateRendererTests()
{
_renderer = new SimpleTemplateRenderer(NullLogger<SimpleTemplateRenderer>.Instance);
}
[Fact]
public async Task RenderAsync_SimpleVariableSubstitution_ReplacesVariables()
{
var template = NotifyTemplate.Create(
templateId: "tpl-1",
tenantId: "tenant-a",
channelType: NotifyChannelType.Slack,
key: "test-template",
locale: "en",
body: "Hello {{actor}}, event {{kind}} occurred.");
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "policy.violation",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: new JsonObject(),
actor: "admin@example.com",
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Contains("Hello admin@example.com", result.Body);
Assert.Contains("event policy.violation occurred", result.Body);
Assert.NotEmpty(result.BodyHash);
}
[Fact]
public async Task RenderAsync_PayloadVariables_FlattenedAndAvailable()
{
var template = NotifyTemplate.Create(
templateId: "tpl-2",
tenantId: "tenant-a",
channelType: NotifyChannelType.Webhook,
key: "payload-test",
locale: "en",
body: "Image: {{image}}, Severity: {{severity}}");
var payload = new JsonObject
{
["image"] = "registry.local/api:v1.0",
["severity"] = "critical"
};
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "scan.complete",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: payload,
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Contains("Image: registry.local/api:v1.0", result.Body);
Assert.Contains("Severity: critical", result.Body);
}
[Fact]
public async Task RenderAsync_NestedPayloadVariables_SupportsDotNotation()
{
var template = NotifyTemplate.Create(
templateId: "tpl-3",
tenantId: "tenant-a",
channelType: NotifyChannelType.Slack,
key: "nested-test",
locale: "en",
body: "Package: {{package.name}} v{{package.version}}");
var payload = new JsonObject
{
["package"] = new JsonObject
{
["name"] = "lodash",
["version"] = "4.17.21"
}
};
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "vulnerability.found",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: payload,
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Contains("Package: lodash v4.17.21", result.Body);
}
[Fact]
public async Task RenderAsync_SensitiveKeys_AreRedacted()
{
var template = NotifyTemplate.Create(
templateId: "tpl-4",
tenantId: "tenant-a",
channelType: NotifyChannelType.Webhook,
key: "redact-test",
locale: "en",
body: "Token: {{apikey}}, User: {{username}}");
var payload = new JsonObject
{
["apikey"] = "secret-token-12345",
["username"] = "testuser"
};
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "auth.event",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: payload,
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Contains("[REDACTED]", result.Body);
Assert.Contains("User: testuser", result.Body);
Assert.DoesNotContain("secret-token-12345", result.Body);
}
[Fact]
public async Task RenderAsync_MissingVariables_ReplacedWithEmptyString()
{
var template = NotifyTemplate.Create(
templateId: "tpl-5",
tenantId: "tenant-a",
channelType: NotifyChannelType.Slack,
key: "missing-test",
locale: "en",
body: "Value: {{nonexistent}}-end");
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "test.event",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: new JsonObject(),
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Equal("Value: -end", result.Body);
}
[Fact]
public async Task RenderAsync_EachBlock_IteratesOverArray()
{
var template = NotifyTemplate.Create(
templateId: "tpl-6",
tenantId: "tenant-a",
channelType: NotifyChannelType.Slack,
key: "each-test",
locale: "en",
body: "Items:{{#each items}} {{this}}{{/each}}");
var payload = new JsonObject
{
["items"] = new JsonArray("alpha", "beta", "gamma")
};
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "list.event",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: payload,
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Contains("alpha", result.Body);
Assert.Contains("beta", result.Body);
Assert.Contains("gamma", result.Body);
}
[Fact]
public async Task RenderAsync_SubjectFromMetadata_RendersSubject()
{
var template = NotifyTemplate.Create(
templateId: "tpl-7",
tenantId: "tenant-a",
channelType: NotifyChannelType.Webhook,
key: "subject-test",
locale: "en",
body: "Body content",
metadata: new[] { new KeyValuePair<string, string>("subject", "Alert: {{kind}}") });
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "critical.alert",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: new JsonObject(),
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Equal("Alert: critical.alert", result.Subject);
}
[Fact]
public async Task RenderAsync_BodyHash_IsConsistent()
{
var template = NotifyTemplate.Create(
templateId: "tpl-8",
tenantId: "tenant-a",
channelType: NotifyChannelType.Slack,
key: "hash-test",
locale: "en",
body: "Static content");
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "test.event",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: new JsonObject(),
version: "1");
var result1 = await _renderer.RenderAsync(template, notifyEvent);
var result2 = await _renderer.RenderAsync(template, notifyEvent);
Assert.Equal(result1.BodyHash, result2.BodyHash);
Assert.Equal(64, result1.BodyHash.Length); // SHA256 hex
}
[Fact]
public async Task RenderAsync_Format_PreservedFromTemplate()
{
var template = NotifyTemplate.Create(
templateId: "tpl-9",
tenantId: "tenant-a",
channelType: NotifyChannelType.Slack,
key: "format-test",
locale: "en",
body: "Content",
format: NotifyDeliveryFormat.Markdown);
var notifyEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: "test.event",
tenant: "tenant-a",
ts: DateTimeOffset.UtcNow,
payload: new JsonObject(),
version: "1");
var result = await _renderer.RenderAsync(template, notifyEvent);
Assert.Equal(NotifyDeliveryFormat.Markdown, result.Format);
}
}

View File

@@ -0,0 +1,242 @@
using System.Net;
using Microsoft.Extensions.Logging.Abstractions;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Dispatch;
using Xunit;
namespace StellaOps.Notifier.Tests.Dispatch;
public sealed class WebhookChannelDispatcherTests
{
[Fact]
public void SupportedTypes_IncludesSlackAndWebhook()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.OK);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
Assert.Contains(NotifyChannelType.Slack, dispatcher.SupportedTypes);
Assert.Contains(NotifyChannelType.Webhook, dispatcher.SupportedTypes);
Assert.Contains(NotifyChannelType.Custom, dispatcher.SupportedTypes);
}
[Fact]
public async Task DispatchAsync_SuccessfulDelivery_ReturnsSucceeded()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.OK);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel("https://hooks.example.com/webhook");
var content = CreateContent("Test message");
var delivery = CreateDelivery();
var result = await dispatcher.DispatchAsync(channel, content, delivery);
Assert.True(result.Success);
Assert.Equal(NotifyDeliveryStatus.Delivered, result.Status);
Assert.Equal(1, result.AttemptCount);
}
[Fact]
public async Task DispatchAsync_InvalidEndpoint_ReturnsFailedWithMessage()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.OK);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel("not-a-valid-url");
var content = CreateContent("Test message");
var delivery = CreateDelivery();
var result = await dispatcher.DispatchAsync(channel, content, delivery);
Assert.False(result.Success);
Assert.Equal(NotifyDeliveryStatus.Failed, result.Status);
Assert.Contains("Invalid webhook endpoint", result.ErrorMessage);
}
[Fact]
public async Task DispatchAsync_NullEndpoint_ReturnsFailedWithMessage()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.OK);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel(null);
var content = CreateContent("Test message");
var delivery = CreateDelivery();
var result = await dispatcher.DispatchAsync(channel, content, delivery);
Assert.False(result.Success);
Assert.Contains("Invalid webhook endpoint", result.ErrorMessage);
}
[Fact]
public async Task DispatchAsync_4xxError_ReturnsNonRetryable()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.BadRequest);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel("https://hooks.example.com/webhook");
var content = CreateContent("Test message");
var delivery = CreateDelivery();
var result = await dispatcher.DispatchAsync(channel, content, delivery);
Assert.False(result.Success);
Assert.Equal(NotifyDeliveryStatus.Failed, result.Status);
Assert.False(result.IsRetryable);
}
[Fact]
public async Task DispatchAsync_5xxError_ReturnsRetryable()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.InternalServerError);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel("https://hooks.example.com/webhook");
var content = CreateContent("Test message");
var delivery = CreateDelivery();
var result = await dispatcher.DispatchAsync(channel, content, delivery);
Assert.False(result.Success);
Assert.True(result.IsRetryable);
Assert.Equal(3, result.AttemptCount); // Should retry up to 3 times
}
[Fact]
public async Task DispatchAsync_TooManyRequests_ReturnsRetryable()
{
var handler = new TestHttpMessageHandler(HttpStatusCode.TooManyRequests);
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel("https://hooks.example.com/webhook");
var content = CreateContent("Test message");
var delivery = CreateDelivery();
var result = await dispatcher.DispatchAsync(channel, content, delivery);
Assert.False(result.Success);
Assert.True(result.IsRetryable);
}
[Fact]
public async Task DispatchAsync_SlackChannel_FormatsCorrectly()
{
string? capturedBody = null;
var handler = new TestHttpMessageHandler(HttpStatusCode.OK, req =>
{
capturedBody = req.Content?.ReadAsStringAsync().GetAwaiter().GetResult();
});
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = NotifyChannel.Create(
channelId: "chn-slack",
tenantId: "tenant-a",
name: "Slack Alerts",
type: NotifyChannelType.Slack,
config: NotifyChannelConfig.Create(
secretRef: "secret-ref",
target: "#alerts",
endpoint: "https://hooks.slack.com/services/xxx"));
var content = CreateContent("Alert notification");
var delivery = CreateDelivery();
await dispatcher.DispatchAsync(channel, content, delivery);
Assert.NotNull(capturedBody);
Assert.Contains("\"text\":", capturedBody);
Assert.Contains("\"channel\":", capturedBody);
Assert.Contains("#alerts", capturedBody);
}
[Fact]
public async Task DispatchAsync_GenericWebhook_IncludesDeliveryMetadata()
{
string? capturedBody = null;
var handler = new TestHttpMessageHandler(HttpStatusCode.OK, req =>
{
capturedBody = req.Content?.ReadAsStringAsync().GetAwaiter().GetResult();
});
var client = new HttpClient(handler);
var dispatcher = new WebhookChannelDispatcher(client, NullLogger<WebhookChannelDispatcher>.Instance);
var channel = CreateChannel("https://api.example.com/notifications");
var content = CreateContent("Webhook content");
var delivery = CreateDelivery();
await dispatcher.DispatchAsync(channel, content, delivery);
Assert.NotNull(capturedBody);
Assert.Contains("\"deliveryId\":", capturedBody);
Assert.Contains("\"eventId\":", capturedBody);
Assert.Contains("\"kind\":", capturedBody);
Assert.Contains("\"body\":", capturedBody);
}
private static NotifyChannel CreateChannel(string? endpoint)
{
return NotifyChannel.Create(
channelId: "chn-test",
tenantId: "tenant-a",
name: "Test Channel",
type: NotifyChannelType.Webhook,
config: NotifyChannelConfig.Create(
secretRef: "secret-ref",
endpoint: endpoint));
}
private static NotifyRenderedContent CreateContent(string body)
{
return new NotifyRenderedContent
{
Body = body,
Subject = "Test Subject",
BodyHash = "abc123",
Format = NotifyDeliveryFormat.PlainText
};
}
private static NotifyDelivery CreateDelivery()
{
return NotifyDelivery.Create(
deliveryId: "del-test-001",
tenantId: "tenant-a",
ruleId: "rule-1",
actionId: "act-1",
eventId: Guid.NewGuid(),
kind: "test.event",
status: NotifyDeliveryStatus.Pending);
}
private sealed class TestHttpMessageHandler : HttpMessageHandler
{
private readonly HttpStatusCode _statusCode;
private readonly Action<HttpRequestMessage>? _onRequest;
public TestHttpMessageHandler(HttpStatusCode statusCode, Action<HttpRequestMessage>? onRequest = null)
{
_statusCode = statusCode;
_onRequest = onRequest;
}
protected override Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request,
CancellationToken cancellationToken)
{
_onRequest?.Invoke(request);
return Task.FromResult(new HttpResponseMessage(_statusCode)
{
Content = new StringContent("OK")
});
}
}
}

View File

@@ -0,0 +1,332 @@
using System.Net;
using System.Net.Http.Json;
using System.Text.Json;
using Microsoft.AspNetCore.Mvc.Testing;
using Microsoft.Extensions.DependencyInjection;
using StellaOps.Notifier.WebService.Contracts;
using StellaOps.Notify.Models;
using StellaOps.Notify.Storage.Mongo.Repositories;
using Xunit;
namespace StellaOps.Notifier.Tests.Endpoints;
public sealed class NotifyApiEndpointsTests : IClassFixture<WebApplicationFactory<Program>>
{
private readonly HttpClient _client;
private readonly InMemoryRuleRepository _ruleRepository;
private readonly InMemoryTemplateRepository _templateRepository;
public NotifyApiEndpointsTests(WebApplicationFactory<Program> factory)
{
_ruleRepository = new InMemoryRuleRepository();
_templateRepository = new InMemoryTemplateRepository();
var customFactory = factory.WithWebHostBuilder(builder =>
{
builder.ConfigureServices(services =>
{
services.AddSingleton<INotifyRuleRepository>(_ruleRepository);
services.AddSingleton<INotifyTemplateRepository>(_templateRepository);
});
builder.UseSetting("Environment", "Testing");
});
_client = customFactory.CreateClient();
_client.DefaultRequestHeaders.Add("X-StellaOps-Tenant", "test-tenant");
}
#region Rules API Tests
[Fact]
public async Task GetRules_ReturnsEmptyList_WhenNoRules()
{
// Act
var response = await _client.GetAsync("/api/v2/notify/rules");
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var rules = await response.Content.ReadFromJsonAsync<List<RuleResponse>>();
Assert.NotNull(rules);
Assert.Empty(rules);
}
[Fact]
public async Task CreateRule_ReturnsCreated_WithValidRequest()
{
// Arrange
var request = new RuleCreateRequest
{
RuleId = "rule-001",
Name = "Test Rule",
Description = "Test description",
Enabled = true,
Match = new RuleMatchRequest
{
EventKinds = ["pack.approval.granted"],
Labels = ["env=prod"]
},
Actions =
[
new RuleActionRequest
{
ActionId = "action-001",
Channel = "slack:alerts",
Template = "tmpl-slack-001"
}
]
};
// Act
var response = await _client.PostAsJsonAsync("/api/v2/notify/rules", request);
// Assert
Assert.Equal(HttpStatusCode.Created, response.StatusCode);
var rule = await response.Content.ReadFromJsonAsync<RuleResponse>();
Assert.NotNull(rule);
Assert.Equal("rule-001", rule.RuleId);
Assert.Equal("Test Rule", rule.Name);
}
[Fact]
public async Task GetRule_ReturnsRule_WhenExists()
{
// Arrange
var rule = NotifyRule.Create(
ruleId: "rule-get-001",
tenantId: "test-tenant",
name: "Existing Rule",
match: NotifyRuleMatch.Create(eventKinds: ["test.event"]),
actions: []);
await _ruleRepository.UpsertAsync(rule);
// Act
var response = await _client.GetAsync("/api/v2/notify/rules/rule-get-001");
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var result = await response.Content.ReadFromJsonAsync<RuleResponse>();
Assert.NotNull(result);
Assert.Equal("rule-get-001", result.RuleId);
}
[Fact]
public async Task GetRule_ReturnsNotFound_WhenNotExists()
{
// Act
var response = await _client.GetAsync("/api/v2/notify/rules/nonexistent");
// Assert
Assert.Equal(HttpStatusCode.NotFound, response.StatusCode);
}
[Fact]
public async Task DeleteRule_ReturnsNoContent_WhenExists()
{
// Arrange
var rule = NotifyRule.Create(
ruleId: "rule-delete-001",
tenantId: "test-tenant",
name: "Delete Me",
match: NotifyRuleMatch.Create(),
actions: []);
await _ruleRepository.UpsertAsync(rule);
// Act
var response = await _client.DeleteAsync("/api/v2/notify/rules/rule-delete-001");
// Assert
Assert.Equal(HttpStatusCode.NoContent, response.StatusCode);
}
#endregion
#region Templates API Tests
[Fact]
public async Task GetTemplates_ReturnsEmptyList_WhenNoTemplates()
{
// Act
var response = await _client.GetAsync("/api/v2/notify/templates");
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var templates = await response.Content.ReadFromJsonAsync<List<TemplateResponse>>();
Assert.NotNull(templates);
}
[Fact]
public async Task PreviewTemplate_ReturnsRenderedContent()
{
// Arrange
var request = new TemplatePreviewRequest
{
TemplateBody = "Hello {{name}}, you have {{count}} messages.",
SamplePayload = JsonSerializer.SerializeToNode(new { name = "World", count = 5 }) as System.Text.Json.Nodes.JsonObject
};
// Act
var response = await _client.PostAsJsonAsync("/api/v2/notify/templates/preview", request);
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var preview = await response.Content.ReadFromJsonAsync<TemplatePreviewResponse>();
Assert.NotNull(preview);
Assert.Contains("Hello World", preview.RenderedBody);
Assert.Contains("5", preview.RenderedBody);
}
[Fact]
public async Task ValidateTemplate_ReturnsValid_ForCorrectTemplate()
{
// Arrange
var request = new TemplatePreviewRequest
{
TemplateBody = "Hello {{name}}!"
};
// Act
var response = await _client.PostAsJsonAsync("/api/v2/notify/templates/validate", request);
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var result = await response.Content.ReadFromJsonAsync<JsonElement>();
Assert.True(result.GetProperty("isValid").GetBoolean());
}
[Fact]
public async Task ValidateTemplate_ReturnsInvalid_ForBrokenTemplate()
{
// Arrange
var request = new TemplatePreviewRequest
{
TemplateBody = "Hello {{name} - missing closing brace"
};
// Act
var response = await _client.PostAsJsonAsync("/api/v2/notify/templates/validate", request);
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var result = await response.Content.ReadFromJsonAsync<JsonElement>();
Assert.False(result.GetProperty("isValid").GetBoolean());
}
#endregion
#region Incidents API Tests
[Fact]
public async Task GetIncidents_ReturnsIncidentList()
{
// Act
var response = await _client.GetAsync("/api/v2/notify/incidents");
// Assert
Assert.Equal(HttpStatusCode.OK, response.StatusCode);
var result = await response.Content.ReadFromJsonAsync<IncidentListResponse>();
Assert.NotNull(result);
Assert.NotNull(result.Incidents);
}
[Fact]
public async Task AckIncident_ReturnsNoContent()
{
// Arrange
var request = new IncidentAckRequest
{
Actor = "test-user",
Comment = "Acknowledged"
};
// Act
var response = await _client.PostAsJsonAsync("/api/v2/notify/incidents/incident-001/ack", request);
// Assert
Assert.Equal(HttpStatusCode.NoContent, response.StatusCode);
}
#endregion
#region Error Handling Tests
[Fact]
public async Task AllEndpoints_ReturnBadRequest_WhenTenantMissing()
{
// Arrange
var clientWithoutTenant = new HttpClient { BaseAddress = _client.BaseAddress };
// Act
var response = await clientWithoutTenant.GetAsync("/api/v2/notify/rules");
// Assert - should fail without tenant header
// Note: actual behavior depends on endpoint implementation
}
#endregion
#region Test Repositories
private sealed class InMemoryRuleRepository : INotifyRuleRepository
{
private readonly Dictionary<string, NotifyRule> _rules = new();
public Task UpsertAsync(NotifyRule rule, CancellationToken cancellationToken = default)
{
var key = $"{rule.TenantId}:{rule.RuleId}";
_rules[key] = rule;
return Task.CompletedTask;
}
public Task<NotifyRule?> GetAsync(string tenantId, string ruleId, CancellationToken cancellationToken = default)
{
var key = $"{tenantId}:{ruleId}";
return Task.FromResult(_rules.GetValueOrDefault(key));
}
public Task<IReadOnlyList<NotifyRule>> ListAsync(string tenantId, CancellationToken cancellationToken = default)
{
var result = _rules.Values.Where(r => r.TenantId == tenantId).ToList();
return Task.FromResult<IReadOnlyList<NotifyRule>>(result);
}
public Task DeleteAsync(string tenantId, string ruleId, CancellationToken cancellationToken = default)
{
var key = $"{tenantId}:{ruleId}";
_rules.Remove(key);
return Task.CompletedTask;
}
}
private sealed class InMemoryTemplateRepository : INotifyTemplateRepository
{
private readonly Dictionary<string, NotifyTemplate> _templates = new();
public Task UpsertAsync(NotifyTemplate template, CancellationToken cancellationToken = default)
{
var key = $"{template.TenantId}:{template.TemplateId}";
_templates[key] = template;
return Task.CompletedTask;
}
public Task<NotifyTemplate?> GetAsync(string tenantId, string templateId, CancellationToken cancellationToken = default)
{
var key = $"{tenantId}:{templateId}";
return Task.FromResult(_templates.GetValueOrDefault(key));
}
public Task<IReadOnlyList<NotifyTemplate>> ListAsync(string tenantId, CancellationToken cancellationToken = default)
{
var result = _templates.Values.Where(t => t.TenantId == tenantId).ToList();
return Task.FromResult<IReadOnlyList<NotifyTemplate>>(result);
}
public Task DeleteAsync(string tenantId, string templateId, CancellationToken cancellationToken = default)
{
var key = $"{tenantId}:{templateId}";
_templates.Remove(key);
return Task.CompletedTask;
}
}
#endregion
}

View File

@@ -0,0 +1,308 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Fallback;
namespace StellaOps.Notifier.Tests.Fallback;
public class InMemoryFallbackHandlerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly FallbackHandlerOptions _options;
private readonly InMemoryFallbackHandler _fallbackHandler;
public InMemoryFallbackHandlerTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new FallbackHandlerOptions
{
Enabled = true,
MaxAttempts = 3,
DefaultChains = new Dictionary<NotifyChannelType, List<NotifyChannelType>>
{
[NotifyChannelType.Slack] = [NotifyChannelType.Teams, NotifyChannelType.Email],
[NotifyChannelType.Teams] = [NotifyChannelType.Slack, NotifyChannelType.Email],
[NotifyChannelType.Email] = [NotifyChannelType.Webhook],
[NotifyChannelType.Webhook] = []
}
};
_fallbackHandler = new InMemoryFallbackHandler(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryFallbackHandler>.Instance);
}
[Fact]
public async Task GetFallbackAsync_FirstFailure_ReturnsNextChannel()
{
// Arrange
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Connection timeout");
// Act
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
// Assert
Assert.True(result.HasFallback);
Assert.Equal(NotifyChannelType.Teams, result.NextChannelType);
Assert.Equal(2, result.AttemptNumber);
Assert.Equal(3, result.TotalChannels); // Slack -> Teams -> Email
}
[Fact]
public async Task GetFallbackAsync_SecondFailure_ReturnsThirdChannel()
{
// Arrange
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Connection timeout");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Teams, "Rate limited");
// Act
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Teams, "delivery1");
// Assert
Assert.True(result.HasFallback);
Assert.Equal(NotifyChannelType.Email, result.NextChannelType);
Assert.Equal(3, result.AttemptNumber);
}
[Fact]
public async Task GetFallbackAsync_AllChannelsFailed_ReturnsExhausted()
{
// Arrange - exhaust all channels (Slack -> Teams -> Email)
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Teams, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Teams, "delivery1");
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Email, "Failed");
// Act
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Email, "delivery1");
// Assert
Assert.False(result.HasFallback);
Assert.True(result.IsExhausted);
Assert.Null(result.NextChannelType);
Assert.Equal(3, result.FailedChannels.Count);
}
[Fact]
public async Task GetFallbackAsync_NoFallbackConfigured_ReturnsNoFallback()
{
// Act - Webhook has no fallback chain
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Webhook, "Failed");
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Webhook, "delivery1");
// Assert
Assert.False(result.HasFallback);
Assert.Contains("No fallback", result.ExhaustionReason);
}
[Fact]
public async Task GetFallbackAsync_DisabledHandler_ReturnsNoFallback()
{
// Arrange
var disabledOptions = new FallbackHandlerOptions { Enabled = false };
var disabledHandler = new InMemoryFallbackHandler(
Options.Create(disabledOptions),
_timeProvider,
NullLogger<InMemoryFallbackHandler>.Instance);
// Act
var result = await disabledHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
// Assert
Assert.False(result.HasFallback);
}
[Fact]
public async Task RecordSuccessAsync_MarksDeliveryAsSucceeded()
{
// Arrange
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
// Act
await _fallbackHandler.RecordSuccessAsync("tenant1", "delivery1", NotifyChannelType.Teams);
// Assert
var stats = await _fallbackHandler.GetStatisticsAsync("tenant1");
Assert.Equal(1, stats.FallbackSuccesses);
}
[Fact]
public async Task GetFallbackChainAsync_ReturnsDefaultChain()
{
// Act
var chain = await _fallbackHandler.GetFallbackChainAsync("tenant1", NotifyChannelType.Slack);
// Assert
Assert.Equal(2, chain.Count);
Assert.Equal(NotifyChannelType.Teams, chain[0]);
Assert.Equal(NotifyChannelType.Email, chain[1]);
}
[Fact]
public async Task SetFallbackChainAsync_CreatesTenantSpecificChain()
{
// Act
await _fallbackHandler.SetFallbackChainAsync(
"tenant1",
NotifyChannelType.Slack,
[NotifyChannelType.Webhook, NotifyChannelType.Email],
"admin");
var chain = await _fallbackHandler.GetFallbackChainAsync("tenant1", NotifyChannelType.Slack);
// Assert
Assert.Equal(2, chain.Count);
Assert.Equal(NotifyChannelType.Webhook, chain[0]);
Assert.Equal(NotifyChannelType.Email, chain[1]);
}
[Fact]
public async Task SetFallbackChainAsync_DoesNotAffectOtherTenants()
{
// Arrange
await _fallbackHandler.SetFallbackChainAsync(
"tenant1",
NotifyChannelType.Slack,
[NotifyChannelType.Webhook],
"admin");
// Act
var tenant1Chain = await _fallbackHandler.GetFallbackChainAsync("tenant1", NotifyChannelType.Slack);
var tenant2Chain = await _fallbackHandler.GetFallbackChainAsync("tenant2", NotifyChannelType.Slack);
// Assert
Assert.Single(tenant1Chain);
Assert.Equal(NotifyChannelType.Webhook, tenant1Chain[0]);
Assert.Equal(2, tenant2Chain.Count); // Default chain
Assert.Equal(NotifyChannelType.Teams, tenant2Chain[0]);
}
[Fact]
public async Task GetStatisticsAsync_ReturnsAccurateStats()
{
// Arrange - Create various delivery scenarios
// Delivery 1: Primary success
await _fallbackHandler.RecordSuccessAsync("tenant1", "delivery1", NotifyChannelType.Slack);
// Delivery 2: Fallback success
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery2", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery2");
await _fallbackHandler.RecordSuccessAsync("tenant1", "delivery2", NotifyChannelType.Teams);
// Delivery 3: Exhausted
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery3", NotifyChannelType.Webhook, "Failed");
// Act
var stats = await _fallbackHandler.GetStatisticsAsync("tenant1");
// Assert
Assert.Equal("tenant1", stats.TenantId);
Assert.Equal(3, stats.TotalDeliveries);
Assert.Equal(1, stats.PrimarySuccesses);
Assert.Equal(1, stats.FallbackSuccesses);
Assert.Equal(1, stats.FallbackAttempts);
}
[Fact]
public async Task GetStatisticsAsync_FiltersWithinWindow()
{
// Arrange
await _fallbackHandler.RecordSuccessAsync("tenant1", "old-delivery", NotifyChannelType.Slack);
_timeProvider.Advance(TimeSpan.FromHours(25));
await _fallbackHandler.RecordSuccessAsync("tenant1", "recent-delivery", NotifyChannelType.Slack);
// Act - Get stats for last 24 hours
var stats = await _fallbackHandler.GetStatisticsAsync("tenant1", TimeSpan.FromHours(24));
// Assert
Assert.Equal(1, stats.TotalDeliveries);
}
[Fact]
public async Task ClearDeliveryStateAsync_RemovesDeliveryTracking()
{
// Arrange
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
// Act
await _fallbackHandler.ClearDeliveryStateAsync("tenant1", "delivery1");
// Get fallback again - should start fresh
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Failed again");
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
// Assert - Should be back to first fallback attempt
Assert.Equal(NotifyChannelType.Teams, result.NextChannelType);
Assert.Equal(2, result.AttemptNumber);
}
[Fact]
public async Task GetFallbackAsync_MaxAttemptsExceeded_ReturnsExhausted()
{
// Arrange - MaxAttempts is 3, but chain has 4 channels (Slack + 3 fallbacks would exceed)
// Add a longer chain
await _fallbackHandler.SetFallbackChainAsync(
"tenant1",
NotifyChannelType.Slack,
[NotifyChannelType.Teams, NotifyChannelType.Email, NotifyChannelType.Webhook, NotifyChannelType.Custom],
"admin");
// Fail through 3 attempts (max)
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Teams, "Failed");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Teams, "delivery1");
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Email, "Failed");
// Act - 4th attempt should be blocked by MaxAttempts
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Email, "delivery1");
// Assert
Assert.True(result.IsExhausted);
}
[Fact]
public async Task RecordFailureAsync_TracksMultipleFailures()
{
// Arrange & Act
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Slack, "Timeout");
await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Slack, "delivery1");
await _fallbackHandler.RecordFailureAsync("tenant1", "delivery1", NotifyChannelType.Teams, "Rate limited");
var result = await _fallbackHandler.GetFallbackAsync("tenant1", NotifyChannelType.Teams, "delivery1");
// Assert
Assert.Equal(2, result.FailedChannels.Count);
Assert.Contains(result.FailedChannels, f => f.ChannelType == NotifyChannelType.Slack && f.Reason == "Timeout");
Assert.Contains(result.FailedChannels, f => f.ChannelType == NotifyChannelType.Teams && f.Reason == "Rate limited");
}
[Fact]
public async Task GetStatisticsAsync_TracksFailuresByChannel()
{
// Arrange
await _fallbackHandler.RecordFailureAsync("tenant1", "d1", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.RecordFailureAsync("tenant1", "d2", NotifyChannelType.Slack, "Failed");
await _fallbackHandler.RecordFailureAsync("tenant1", "d3", NotifyChannelType.Teams, "Failed");
// Act
var stats = await _fallbackHandler.GetStatisticsAsync("tenant1");
// Assert
Assert.Equal(2, stats.FailuresByChannel[NotifyChannelType.Slack]);
Assert.Equal(1, stats.FailuresByChannel[NotifyChannelType.Teams]);
}
}

View File

@@ -0,0 +1,398 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Localization;
namespace StellaOps.Notifier.Tests.Localization;
public class InMemoryLocalizationServiceTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly LocalizationServiceOptions _options;
private readonly InMemoryLocalizationService _localizationService;
public InMemoryLocalizationServiceTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new LocalizationServiceOptions
{
DefaultLocale = "en-US",
EnableFallback = true,
EnableCaching = true,
CacheDuration = TimeSpan.FromMinutes(15),
ReturnKeyWhenMissing = true,
PlaceholderFormat = "named"
};
_localizationService = new InMemoryLocalizationService(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryLocalizationService>.Instance);
}
[Fact]
public async Task GetStringAsync_SystemBundle_ReturnsValue()
{
// Act - system bundles are seeded automatically
var value = await _localizationService.GetStringAsync("tenant1", "storm.detected.title", "en-US");
// Assert
Assert.NotNull(value);
Assert.Equal("Notification Storm Detected", value);
}
[Fact]
public async Task GetStringAsync_GermanLocale_ReturnsGermanValue()
{
// Act
var value = await _localizationService.GetStringAsync("tenant1", "storm.detected.title", "de-DE");
// Assert
Assert.NotNull(value);
Assert.Equal("Benachrichtigungssturm erkannt", value);
}
[Fact]
public async Task GetStringAsync_FrenchLocale_ReturnsFrenchValue()
{
// Act
var value = await _localizationService.GetStringAsync("tenant1", "storm.detected.title", "fr-FR");
// Assert
Assert.NotNull(value);
Assert.Equal("Tempête de notifications détectée", value);
}
[Fact]
public async Task GetStringAsync_UnknownKey_ReturnsKey()
{
// Act
var value = await _localizationService.GetStringAsync("tenant1", "unknown.key", "en-US");
// Assert (when ReturnKeyWhenMissing = true)
Assert.Equal("unknown.key", value);
}
[Fact]
public async Task GetStringAsync_LocaleFallback_UsesDefaultLocale()
{
// Act - Japanese locale (not configured) should fall back to en-US
var value = await _localizationService.GetStringAsync("tenant1", "storm.detected.title", "ja-JP");
// Assert - should get en-US value
Assert.Equal("Notification Storm Detected", value);
}
[Fact]
public async Task GetFormattedStringAsync_ReplacesPlaceholders()
{
// Act
var parameters = new Dictionary<string, object>
{
["stormKey"] = "critical.alert",
["count"] = 50,
["window"] = "5 minutes"
};
var value = await _localizationService.GetFormattedStringAsync(
"tenant1", "storm.detected.body", "en-US", parameters);
// Assert
Assert.NotNull(value);
Assert.Contains("critical.alert", value);
Assert.Contains("50", value);
Assert.Contains("5 minutes", value);
}
[Fact]
public async Task UpsertBundleAsync_CreatesTenantBundle()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "tenant-bundle",
TenantId = "tenant1",
Locale = "en-US",
Namespace = "custom",
Strings = new Dictionary<string, string>
{
["custom.greeting"] = "Hello, World!"
},
Description = "Custom tenant bundle"
};
// Act
var result = await _localizationService.UpsertBundleAsync(bundle, "admin");
// Assert
Assert.True(result.Success);
Assert.True(result.IsNew);
Assert.Equal("tenant-bundle", result.BundleId);
// Verify string is accessible
var greeting = await _localizationService.GetStringAsync("tenant1", "custom.greeting", "en-US");
Assert.Equal("Hello, World!", greeting);
}
[Fact]
public async Task UpsertBundleAsync_UpdatesExistingBundle()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "update-test",
TenantId = "tenant1",
Locale = "en-US",
Strings = new Dictionary<string, string>
{
["test.key"] = "Original value"
}
};
await _localizationService.UpsertBundleAsync(bundle, "admin");
// Act - update with new value
var updatedBundle = bundle with
{
Strings = new Dictionary<string, string>
{
["test.key"] = "Updated value"
}
};
var result = await _localizationService.UpsertBundleAsync(updatedBundle, "admin");
// Assert
Assert.True(result.Success);
Assert.False(result.IsNew);
var value = await _localizationService.GetStringAsync("tenant1", "test.key", "en-US");
Assert.Equal("Updated value", value);
}
[Fact]
public async Task DeleteBundleAsync_RemovesBundle()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "delete-test",
TenantId = "tenant1",
Locale = "en-US",
Strings = new Dictionary<string, string>
{
["delete.key"] = "Will be deleted"
}
};
await _localizationService.UpsertBundleAsync(bundle, "admin");
// Act
var deleted = await _localizationService.DeleteBundleAsync("tenant1", "delete-test", "admin");
// Assert
Assert.True(deleted);
var bundles = await _localizationService.ListBundlesAsync("tenant1");
Assert.DoesNotContain(bundles, b => b.BundleId == "delete-test");
}
[Fact]
public async Task ListBundlesAsync_ReturnsAllTenantBundles()
{
// Arrange
var bundle1 = new LocalizationBundle
{
BundleId = "list-test-1",
TenantId = "tenant1",
Locale = "en-US",
Strings = new Dictionary<string, string> { ["key1"] = "value1" }
};
var bundle2 = new LocalizationBundle
{
BundleId = "list-test-2",
TenantId = "tenant1",
Locale = "de-DE",
Strings = new Dictionary<string, string> { ["key2"] = "value2" }
};
var bundle3 = new LocalizationBundle
{
BundleId = "other-tenant",
TenantId = "tenant2",
Locale = "en-US",
Strings = new Dictionary<string, string> { ["key3"] = "value3" }
};
await _localizationService.UpsertBundleAsync(bundle1, "admin");
await _localizationService.UpsertBundleAsync(bundle2, "admin");
await _localizationService.UpsertBundleAsync(bundle3, "admin");
// Act
var tenant1Bundles = await _localizationService.ListBundlesAsync("tenant1");
// Assert
Assert.Equal(2, tenant1Bundles.Count);
Assert.Contains(tenant1Bundles, b => b.BundleId == "list-test-1");
Assert.Contains(tenant1Bundles, b => b.BundleId == "list-test-2");
Assert.DoesNotContain(tenant1Bundles, b => b.BundleId == "other-tenant");
}
[Fact]
public async Task GetSupportedLocalesAsync_ReturnsAvailableLocales()
{
// Act
var locales = await _localizationService.GetSupportedLocalesAsync("tenant1");
// Assert - should include seeded system locales
Assert.Contains("en-US", locales);
Assert.Contains("de-DE", locales);
Assert.Contains("fr-FR", locales);
}
[Fact]
public async Task GetBundleAsync_ReturnsMergedStrings()
{
// Arrange - add tenant bundle that overrides a system string
var tenantBundle = new LocalizationBundle
{
BundleId = "tenant-override",
TenantId = "tenant1",
Locale = "en-US",
Priority = 10, // Higher priority than system (0)
Strings = new Dictionary<string, string>
{
["storm.detected.title"] = "Custom Storm Title",
["tenant.custom"] = "Custom Value"
}
};
await _localizationService.UpsertBundleAsync(tenantBundle, "admin");
// Act
var bundle = await _localizationService.GetBundleAsync("tenant1", "en-US");
// Assert - should have both system and tenant strings, with tenant override
Assert.True(bundle.ContainsKey("storm.detected.title"));
Assert.Equal("Custom Storm Title", bundle["storm.detected.title"]);
Assert.True(bundle.ContainsKey("tenant.custom"));
Assert.True(bundle.ContainsKey("fallback.attempted.title")); // System string
}
[Fact]
public void Validate_ValidBundle_ReturnsValid()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "valid-bundle",
TenantId = "tenant1",
Locale = "en-US",
Strings = new Dictionary<string, string>
{
["key1"] = "value1"
}
};
// Act
var result = _localizationService.Validate(bundle);
// Assert
Assert.True(result.IsValid);
Assert.Empty(result.Errors);
}
[Fact]
public void Validate_MissingBundleId_ReturnsInvalid()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "",
TenantId = "tenant1",
Locale = "en-US",
Strings = new Dictionary<string, string> { ["key"] = "value" }
};
// Act
var result = _localizationService.Validate(bundle);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Contains("Bundle ID"));
}
[Fact]
public void Validate_MissingLocale_ReturnsInvalid()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "test",
TenantId = "tenant1",
Locale = "",
Strings = new Dictionary<string, string> { ["key"] = "value" }
};
// Act
var result = _localizationService.Validate(bundle);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Contains("Locale"));
}
[Fact]
public void Validate_EmptyStrings_ReturnsInvalid()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "test",
TenantId = "tenant1",
Locale = "en-US",
Strings = new Dictionary<string, string>()
};
// Act
var result = _localizationService.Validate(bundle);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Contains("at least one string"));
}
[Fact]
public async Task GetStringAsync_CachesResults()
{
// Act - first call
var value1 = await _localizationService.GetStringAsync("tenant1", "storm.detected.title", "en-US");
// Advance time slightly (within cache duration)
_timeProvider.Advance(TimeSpan.FromMinutes(5));
// Second call should hit cache
var value2 = await _localizationService.GetStringAsync("tenant1", "storm.detected.title", "en-US");
// Assert
Assert.Equal(value1, value2);
}
[Fact]
public async Task GetFormattedStringAsync_FormatsNumbers()
{
// Arrange
var bundle = new LocalizationBundle
{
BundleId = "number-test",
TenantId = "tenant1",
Locale = "de-DE",
Strings = new Dictionary<string, string>
{
["number.test"] = "Total: {{count}} items"
}
};
await _localizationService.UpsertBundleAsync(bundle, "admin");
// Act
var parameters = new Dictionary<string, object> { ["count"] = 1234567 };
var value = await _localizationService.GetFormattedStringAsync(
"tenant1", "number.test", "de-DE", parameters);
// Assert - German number formatting uses periods as thousands separator
Assert.Contains("1.234.567", value);
}
}

View File

@@ -0,0 +1,492 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Observability;
namespace StellaOps.Notifier.Tests.Observability;
public class ChaosTestRunnerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly ChaosTestOptions _options;
private readonly InMemoryChaosTestRunner _runner;
public ChaosTestRunnerTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new ChaosTestOptions
{
Enabled = true,
MaxConcurrentExperiments = 5,
MaxExperimentDuration = TimeSpan.FromHours(1),
RequireTenantTarget = false
};
_runner = new InMemoryChaosTestRunner(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryChaosTestRunner>.Instance);
}
[Fact]
public async Task StartExperimentAsync_CreatesExperiment()
{
// Arrange
var config = new ChaosExperimentConfig
{
Name = "Test Outage",
InitiatedBy = "test-user",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Outage,
Duration = TimeSpan.FromMinutes(5)
};
// Act
var experiment = await _runner.StartExperimentAsync(config);
// Assert
Assert.NotNull(experiment);
Assert.Equal(ChaosExperimentStatus.Running, experiment.Status);
Assert.Equal("Test Outage", experiment.Config.Name);
Assert.NotNull(experiment.StartedAt);
}
[Fact]
public async Task StartExperimentAsync_WhenDisabled_Throws()
{
// Arrange
var disabledOptions = new ChaosTestOptions { Enabled = false };
var runner = new InMemoryChaosTestRunner(
Options.Create(disabledOptions),
_timeProvider,
NullLogger<InMemoryChaosTestRunner>.Instance);
var config = new ChaosExperimentConfig
{
Name = "Test",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
};
// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(() => runner.StartExperimentAsync(config));
}
[Fact]
public async Task StartExperimentAsync_ExceedsMaxDuration_Throws()
{
// Arrange
var config = new ChaosExperimentConfig
{
Name = "Long Experiment",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage,
Duration = TimeSpan.FromHours(2) // Exceeds max of 1 hour
};
// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(() => _runner.StartExperimentAsync(config));
}
[Fact]
public async Task StartExperimentAsync_MaxConcurrentReached_Throws()
{
// Arrange - start max number of experiments
for (var i = 0; i < 5; i++)
{
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = $"Experiment {i}",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
});
}
// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(() =>
_runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "One too many",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
}));
}
[Fact]
public async Task StopExperimentAsync_StopsExperiment()
{
// Arrange
var experiment = await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Test",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
});
// Act
await _runner.StopExperimentAsync(experiment.Id);
// Assert
var stopped = await _runner.GetExperimentAsync(experiment.Id);
Assert.NotNull(stopped);
Assert.Equal(ChaosExperimentStatus.Stopped, stopped.Status);
Assert.NotNull(stopped.EndedAt);
}
[Fact]
public async Task ShouldFailAsync_OutageFault_ReturnsFault()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Email Outage",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Outage
});
// Act
var decision = await _runner.ShouldFailAsync("tenant1", "email");
// Assert
Assert.True(decision.ShouldFail);
Assert.Equal(ChaosFaultType.Outage, decision.FaultType);
Assert.NotNull(decision.InjectedError);
}
[Fact]
public async Task ShouldFailAsync_NoMatchingExperiment_ReturnsNoFault()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Email Outage",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Outage
});
// Act - different tenant
var decision = await _runner.ShouldFailAsync("tenant2", "email");
// Assert
Assert.False(decision.ShouldFail);
}
[Fact]
public async Task ShouldFailAsync_WrongChannelType_ReturnsNoFault()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Email Outage",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Outage
});
// Act - different channel type
var decision = await _runner.ShouldFailAsync("tenant1", "slack");
// Assert
Assert.False(decision.ShouldFail);
}
[Fact]
public async Task ShouldFailAsync_LatencyFault_InjectsLatency()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Latency Test",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Latency,
FaultConfig = new ChaosFaultConfig
{
MinLatency = TimeSpan.FromSeconds(1),
MaxLatency = TimeSpan.FromSeconds(5)
}
});
// Act
var decision = await _runner.ShouldFailAsync("tenant1", "email");
// Assert
Assert.False(decision.ShouldFail); // Latency doesn't cause failure
Assert.NotNull(decision.InjectedLatency);
Assert.InRange(decision.InjectedLatency.Value.TotalSeconds, 1, 5);
}
[Fact]
public async Task ShouldFailAsync_PartialFailure_UsesFailureRate()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Partial Failure",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.PartialFailure,
FaultConfig = new ChaosFaultConfig
{
FailureRate = 0.5,
Seed = 42 // Fixed seed for reproducibility
}
});
// Act - run multiple times
var failures = 0;
for (var i = 0; i < 100; i++)
{
var decision = await _runner.ShouldFailAsync("tenant1", "email");
if (decision.ShouldFail) failures++;
}
// Assert - should be roughly 50% failures (with some variance)
Assert.InRange(failures, 30, 70);
}
[Fact]
public async Task ShouldFailAsync_RateLimit_EnforcesLimit()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Rate Limit",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.RateLimit,
FaultConfig = new ChaosFaultConfig
{
RateLimitPerMinute = 5
}
});
// Act - first 5 should pass
for (var i = 0; i < 5; i++)
{
var decision = await _runner.ShouldFailAsync("tenant1", "email");
Assert.False(decision.ShouldFail);
}
// 6th should fail
var failedDecision = await _runner.ShouldFailAsync("tenant1", "email");
// Assert
Assert.True(failedDecision.ShouldFail);
Assert.Equal(429, failedDecision.InjectedStatusCode);
}
[Fact]
public async Task ShouldFailAsync_ExperimentExpires_StopsMatching()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Short Experiment",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Outage,
Duration = TimeSpan.FromMinutes(5)
});
// Act - advance time past duration
_timeProvider.Advance(TimeSpan.FromMinutes(10));
var decision = await _runner.ShouldFailAsync("tenant1", "email");
// Assert
Assert.False(decision.ShouldFail);
}
[Fact]
public async Task ShouldFailAsync_MaxOperationsReached_StopsMatching()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Limited Experiment",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.Outage,
MaxAffectedOperations = 3
});
// Act - consume all operations
for (var i = 0; i < 3; i++)
{
var d = await _runner.ShouldFailAsync("tenant1", "email");
Assert.True(d.ShouldFail);
}
// 4th should not match
var decision = await _runner.ShouldFailAsync("tenant1", "email");
// Assert
Assert.False(decision.ShouldFail);
}
[Fact]
public async Task RecordOutcomeAsync_RecordsOutcome()
{
// Arrange
var experiment = await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Test",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
});
// Act
await _runner.RecordOutcomeAsync(experiment.Id, new ChaosOutcome
{
Type = ChaosOutcomeType.FaultInjected,
ChannelType = "email",
TenantId = "tenant1",
FallbackTriggered = true
});
var results = await _runner.GetResultsAsync(experiment.Id);
// Assert
Assert.Equal(1, results.TotalAffected);
Assert.Equal(1, results.FailedOperations);
Assert.Equal(1, results.FallbackTriggered);
}
[Fact]
public async Task GetResultsAsync_CalculatesStatistics()
{
// Arrange
var experiment = await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Test",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Latency
});
// Record various outcomes
await _runner.RecordOutcomeAsync(experiment.Id, new ChaosOutcome
{
Type = ChaosOutcomeType.LatencyInjected,
ChannelType = "email",
Duration = TimeSpan.FromMilliseconds(100)
});
await _runner.RecordOutcomeAsync(experiment.Id, new ChaosOutcome
{
Type = ChaosOutcomeType.LatencyInjected,
ChannelType = "email",
Duration = TimeSpan.FromMilliseconds(200)
});
await _runner.RecordOutcomeAsync(experiment.Id, new ChaosOutcome
{
Type = ChaosOutcomeType.FaultInjected,
ChannelType = "slack",
FallbackTriggered = true
});
// Act
var results = await _runner.GetResultsAsync(experiment.Id);
// Assert
Assert.Equal(3, results.TotalAffected);
Assert.Equal(1, results.FailedOperations);
Assert.Equal(1, results.FallbackTriggered);
Assert.NotNull(results.AverageInjectedLatency);
Assert.Equal(150, results.AverageInjectedLatency.Value.TotalMilliseconds);
Assert.Equal(2, results.ByChannelType["email"].TotalAffected);
Assert.Equal(1, results.ByChannelType["slack"].TotalAffected);
}
[Fact]
public async Task ListExperimentsAsync_FiltersByStatus()
{
// Arrange
var running = await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Running",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
});
var toStop = await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "To Stop",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage
});
await _runner.StopExperimentAsync(toStop.Id);
// Act
var runningList = await _runner.ListExperimentsAsync(ChaosExperimentStatus.Running);
var stoppedList = await _runner.ListExperimentsAsync(ChaosExperimentStatus.Stopped);
// Assert
Assert.Single(runningList);
Assert.Single(stoppedList);
Assert.Equal(running.Id, runningList[0].Id);
Assert.Equal(toStop.Id, stoppedList[0].Id);
}
[Fact]
public async Task CleanupAsync_RemovesOldExperiments()
{
// Arrange
var experiment = await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Old Experiment",
InitiatedBy = "test-user",
FaultType = ChaosFaultType.Outage,
Duration = TimeSpan.FromMinutes(5)
});
// Complete the experiment
_timeProvider.Advance(TimeSpan.FromMinutes(10));
await _runner.GetExperimentAsync(experiment.Id); // Triggers status update
// Advance time beyond cleanup threshold
_timeProvider.Advance(TimeSpan.FromDays(10));
// Act
var removed = await _runner.CleanupAsync(TimeSpan.FromDays(7));
// Assert
Assert.Equal(1, removed);
var result = await _runner.GetExperimentAsync(experiment.Id);
Assert.Null(result);
}
[Fact]
public async Task ErrorResponseFault_ReturnsConfiguredStatusCode()
{
// Arrange
await _runner.StartExperimentAsync(new ChaosExperimentConfig
{
Name = "Error Response",
InitiatedBy = "test-user",
TenantId = "tenant1",
TargetChannelTypes = ["email"],
FaultType = ChaosFaultType.ErrorResponse,
FaultConfig = new ChaosFaultConfig
{
ErrorStatusCode = 503,
ErrorMessage = "Service Unavailable"
}
});
// Act
var decision = await _runner.ShouldFailAsync("tenant1", "email");
// Assert
Assert.True(decision.ShouldFail);
Assert.Equal(503, decision.InjectedStatusCode);
Assert.Contains("Service Unavailable", decision.InjectedError);
}
}

View File

@@ -0,0 +1,495 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Observability;
namespace StellaOps.Notifier.Tests.Observability;
public class DeadLetterHandlerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly DeadLetterOptions _options;
private readonly InMemoryDeadLetterHandler _handler;
public DeadLetterHandlerTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new DeadLetterOptions
{
Enabled = true,
MaxRetries = 3,
RetryDelay = TimeSpan.FromMinutes(5),
MaxEntriesPerTenant = 1000
};
_handler = new InMemoryDeadLetterHandler(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryDeadLetterHandler>.Instance);
}
[Fact]
public async Task DeadLetterAsync_AddsEntry()
{
// Arrange
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Connection timeout",
OriginalPayload = "{ \"to\": \"user@example.com\" }",
ErrorDetails = "SMTP timeout after 30s",
AttemptCount = 3
};
// Act
await _handler.DeadLetterAsync(entry);
// Assert
var entries = await _handler.GetEntriesAsync("tenant1");
Assert.Single(entries);
Assert.Equal("delivery-001", entries[0].DeliveryId);
}
[Fact]
public async Task DeadLetterAsync_WhenDisabled_DoesNotAdd()
{
// Arrange
var disabledOptions = new DeadLetterOptions { Enabled = false };
var handler = new InMemoryDeadLetterHandler(
Options.Create(disabledOptions),
_timeProvider,
NullLogger<InMemoryDeadLetterHandler>.Instance);
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
};
// Act
await handler.DeadLetterAsync(entry);
// Assert
var entries = await handler.GetEntriesAsync("tenant1");
Assert.Empty(entries);
}
[Fact]
public async Task GetEntryAsync_ReturnsEntry()
{
// Arrange
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
};
await _handler.DeadLetterAsync(entry);
// Get the entry ID from the list
var entries = await _handler.GetEntriesAsync("tenant1");
var entryId = entries[0].Id;
// Act
var retrieved = await _handler.GetEntryAsync("tenant1", entryId);
// Assert
Assert.NotNull(retrieved);
Assert.Equal("delivery-001", retrieved.DeliveryId);
}
[Fact]
public async Task GetEntryAsync_WrongTenant_ReturnsNull()
{
// Arrange
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
};
await _handler.DeadLetterAsync(entry);
var entries = await _handler.GetEntriesAsync("tenant1");
var entryId = entries[0].Id;
// Act
var retrieved = await _handler.GetEntryAsync("tenant2", entryId);
// Assert
Assert.Null(retrieved);
}
[Fact]
public async Task RetryAsync_UpdatesStatus()
{
// Arrange
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
};
await _handler.DeadLetterAsync(entry);
var entries = await _handler.GetEntriesAsync("tenant1");
var entryId = entries[0].Id;
// Act
var result = await _handler.RetryAsync("tenant1", entryId, "admin");
// Assert
Assert.True(result.Scheduled);
Assert.Equal(entryId, result.EntryId);
var updated = await _handler.GetEntryAsync("tenant1", entryId);
Assert.NotNull(updated);
Assert.Equal(DeadLetterStatus.PendingRetry, updated.Status);
}
[Fact]
public async Task RetryAsync_ExceedsMaxRetries_Throws()
{
// Arrange
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error",
RetryCount = 3 // Already at max
};
await _handler.DeadLetterAsync(entry);
var entries = await _handler.GetEntriesAsync("tenant1");
var entryId = entries[0].Id;
// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(() =>
_handler.RetryAsync("tenant1", entryId, "admin"));
}
[Fact]
public async Task DiscardAsync_UpdatesStatus()
{
// Arrange
var entry = new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
};
await _handler.DeadLetterAsync(entry);
var entries = await _handler.GetEntriesAsync("tenant1");
var entryId = entries[0].Id;
// Act
await _handler.DiscardAsync("tenant1", entryId, "Not needed", "admin");
// Assert
var updated = await _handler.GetEntryAsync("tenant1", entryId);
Assert.NotNull(updated);
Assert.Equal(DeadLetterStatus.Discarded, updated.Status);
Assert.Equal("Not needed", updated.DiscardReason);
}
[Fact]
public async Task GetEntriesAsync_FiltersByStatus()
{
// Arrange
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-002",
ChannelType = "email",
Reason = "Error"
});
var entries = await _handler.GetEntriesAsync("tenant1");
await _handler.DiscardAsync("tenant1", entries[0].Id, "Test", "admin");
// Act
var pending = await _handler.GetEntriesAsync("tenant1", status: DeadLetterStatus.Pending);
var discarded = await _handler.GetEntriesAsync("tenant1", status: DeadLetterStatus.Discarded);
// Assert
Assert.Single(pending);
Assert.Single(discarded);
}
[Fact]
public async Task GetEntriesAsync_FiltersByChannelType()
{
// Arrange
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-002",
ChannelType = "slack",
Reason = "Error"
});
// Act
var emailEntries = await _handler.GetEntriesAsync("tenant1", channelType: "email");
// Assert
Assert.Single(emailEntries);
Assert.Equal("email", emailEntries[0].ChannelType);
}
[Fact]
public async Task GetEntriesAsync_PaginatesResults()
{
// Arrange
for (var i = 0; i < 10; i++)
{
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = $"delivery-{i:D3}",
ChannelType = "email",
Reason = "Error"
});
}
// Act
var page1 = await _handler.GetEntriesAsync("tenant1", limit: 5, offset: 0);
var page2 = await _handler.GetEntriesAsync("tenant1", limit: 5, offset: 5);
// Assert
Assert.Equal(5, page1.Count);
Assert.Equal(5, page2.Count);
Assert.NotEqual(page1[0].Id, page2[0].Id);
}
[Fact]
public async Task GetStatisticsAsync_CalculatesStats()
{
// Arrange
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Timeout"
});
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-002",
ChannelType = "email",
Reason = "Timeout"
});
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-003",
ChannelType = "slack",
Reason = "Auth failed"
});
// Act
var stats = await _handler.GetStatisticsAsync("tenant1");
// Assert
Assert.Equal(3, stats.TotalEntries);
Assert.Equal(3, stats.PendingCount);
Assert.Equal(2, stats.ByChannelType["email"]);
Assert.Equal(1, stats.ByChannelType["slack"]);
Assert.Equal(2, stats.ByReason["Timeout"]);
Assert.Equal(1, stats.ByReason["Auth failed"]);
}
[Fact]
public async Task GetStatisticsAsync_FiltersToWindow()
{
// Arrange
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
_timeProvider.Advance(TimeSpan.FromHours(25));
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-002",
ChannelType = "email",
Reason = "Error"
});
// Act - get stats for last 24 hours only
var stats = await _handler.GetStatisticsAsync("tenant1", TimeSpan.FromHours(24));
// Assert
Assert.Equal(1, stats.TotalEntries);
}
[Fact]
public async Task PurgeAsync_RemovesOldEntries()
{
// Arrange
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
_timeProvider.Advance(TimeSpan.FromDays(10));
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-002",
ChannelType = "email",
Reason = "Error"
});
// Act - purge entries older than 7 days
var purged = await _handler.PurgeAsync("tenant1", TimeSpan.FromDays(7));
// Assert
Assert.Equal(1, purged);
var entries = await _handler.GetEntriesAsync("tenant1");
Assert.Single(entries);
Assert.Equal("delivery-002", entries[0].DeliveryId);
}
[Fact]
public async Task Subscribe_NotifiesObserver()
{
// Arrange
var observer = new TestDeadLetterObserver();
using var subscription = _handler.Subscribe(observer);
// Act
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
// Assert
Assert.Single(observer.ReceivedEvents);
Assert.Equal(DeadLetterEventType.Added, observer.ReceivedEvents[0].Type);
}
[Fact]
public async Task Subscribe_NotifiesOnRetry()
{
// Arrange
var observer = new TestDeadLetterObserver();
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
var entries = await _handler.GetEntriesAsync("tenant1");
var entryId = entries[0].Id;
using var subscription = _handler.Subscribe(observer);
// Act
await _handler.RetryAsync("tenant1", entryId, "admin");
// Assert
Assert.Single(observer.ReceivedEvents);
Assert.Equal(DeadLetterEventType.RetryScheduled, observer.ReceivedEvents[0].Type);
}
[Fact]
public async Task Subscribe_DisposedDoesNotNotify()
{
// Arrange
var observer = new TestDeadLetterObserver();
var subscription = _handler.Subscribe(observer);
subscription.Dispose();
// Act
await _handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = "delivery-001",
ChannelType = "email",
Reason = "Error"
});
// Assert
Assert.Empty(observer.ReceivedEvents);
}
[Fact]
public async Task MaxEntriesPerTenant_EnforcesLimit()
{
// Arrange
var limitedOptions = new DeadLetterOptions
{
Enabled = true,
MaxEntriesPerTenant = 3
};
var handler = new InMemoryDeadLetterHandler(
Options.Create(limitedOptions),
_timeProvider,
NullLogger<InMemoryDeadLetterHandler>.Instance);
// Act - add 5 entries
for (var i = 0; i < 5; i++)
{
await handler.DeadLetterAsync(new DeadLetterEntry
{
TenantId = "tenant1",
DeliveryId = $"delivery-{i:D3}",
ChannelType = "email",
Reason = "Error"
});
}
// Assert - should only have 3 entries (oldest removed)
var entries = await handler.GetEntriesAsync("tenant1");
Assert.Equal(3, entries.Count);
}
private sealed class TestDeadLetterObserver : IDeadLetterObserver
{
public List<DeadLetterEvent> ReceivedEvents { get; } = [];
public void OnDeadLetterEvent(DeadLetterEvent evt)
{
ReceivedEvents.Add(evt);
}
}
}

View File

@@ -0,0 +1,475 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Observability;
namespace StellaOps.Notifier.Tests.Observability;
public class RetentionPolicyServiceTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly RetentionPolicyOptions _options;
private readonly InMemoryRetentionPolicyService _service;
public RetentionPolicyServiceTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new RetentionPolicyOptions
{
Enabled = true,
DefaultRetentionPeriod = TimeSpan.FromDays(90),
MinRetentionPeriod = TimeSpan.FromDays(1),
MaxRetentionPeriod = TimeSpan.FromDays(365)
};
_service = new InMemoryRetentionPolicyService(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryRetentionPolicyService>.Instance);
}
[Fact]
public async Task RegisterPolicyAsync_CreatesPolicy()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "Delivery Log Cleanup",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30),
Action = RetentionAction.Delete
};
// Act
await _service.RegisterPolicyAsync(policy);
// Assert
var retrieved = await _service.GetPolicyAsync("policy-001");
Assert.NotNull(retrieved);
Assert.Equal("Delivery Log Cleanup", retrieved.Name);
Assert.Equal(RetentionDataType.DeliveryLogs, retrieved.DataType);
}
[Fact]
public async Task RegisterPolicyAsync_DuplicateId_Throws()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "Policy 1",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
};
await _service.RegisterPolicyAsync(policy);
// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(() =>
_service.RegisterPolicyAsync(policy));
}
[Fact]
public async Task RegisterPolicyAsync_RetentionTooShort_Throws()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "Too Short",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromHours(1) // Less than 1 day minimum
};
// Act & Assert
await Assert.ThrowsAsync<ArgumentException>(() =>
_service.RegisterPolicyAsync(policy));
}
[Fact]
public async Task RegisterPolicyAsync_RetentionTooLong_Throws()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "Too Long",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(500) // More than 365 days maximum
};
// Act & Assert
await Assert.ThrowsAsync<ArgumentException>(() =>
_service.RegisterPolicyAsync(policy));
}
[Fact]
public async Task RegisterPolicyAsync_ArchiveWithoutLocation_Throws()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "Archive Without Location",
DataType = RetentionDataType.AuditLogs,
RetentionPeriod = TimeSpan.FromDays(90),
Action = RetentionAction.Archive
// Missing ArchiveLocation
};
// Act & Assert
await Assert.ThrowsAsync<ArgumentException>(() =>
_service.RegisterPolicyAsync(policy));
}
[Fact]
public async Task UpdatePolicyAsync_UpdatesPolicy()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "Original Name",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
};
await _service.RegisterPolicyAsync(policy);
// Act
var updated = policy with { Name = "Updated Name" };
await _service.UpdatePolicyAsync("policy-001", updated);
// Assert
var retrieved = await _service.GetPolicyAsync("policy-001");
Assert.NotNull(retrieved);
Assert.Equal("Updated Name", retrieved.Name);
Assert.NotNull(retrieved.ModifiedAt);
}
[Fact]
public async Task UpdatePolicyAsync_NotFound_Throws()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "nonexistent",
Name = "Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
};
// Act & Assert
await Assert.ThrowsAsync<KeyNotFoundException>(() =>
_service.UpdatePolicyAsync("nonexistent", policy));
}
[Fact]
public async Task DeletePolicyAsync_RemovesPolicy()
{
// Arrange
var policy = new RetentionPolicy
{
Id = "policy-001",
Name = "To Delete",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
};
await _service.RegisterPolicyAsync(policy);
// Act
await _service.DeletePolicyAsync("policy-001");
// Assert
var retrieved = await _service.GetPolicyAsync("policy-001");
Assert.Null(retrieved);
}
[Fact]
public async Task ListPoliciesAsync_ReturnsAllPolicies()
{
// Arrange
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Policy A",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
});
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-002",
Name = "Policy B",
DataType = RetentionDataType.Escalations,
RetentionPeriod = TimeSpan.FromDays(60)
});
// Act
var policies = await _service.ListPoliciesAsync();
// Assert
Assert.Equal(2, policies.Count);
}
[Fact]
public async Task ListPoliciesAsync_FiltersByTenant()
{
// Arrange
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Global Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30),
TenantId = null // Global
});
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-002",
Name = "Tenant Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30),
TenantId = "tenant1"
});
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-003",
Name = "Other Tenant Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30),
TenantId = "tenant2"
});
// Act
var tenant1Policies = await _service.ListPoliciesAsync("tenant1");
// Assert - should include global and tenant-specific
Assert.Equal(2, tenant1Policies.Count);
Assert.Contains(tenant1Policies, p => p.Id == "policy-001");
Assert.Contains(tenant1Policies, p => p.Id == "policy-002");
Assert.DoesNotContain(tenant1Policies, p => p.Id == "policy-003");
}
[Fact]
public async Task ExecuteRetentionAsync_WhenDisabled_ReturnsError()
{
// Arrange
var disabledOptions = new RetentionPolicyOptions { Enabled = false };
var service = new InMemoryRetentionPolicyService(
Options.Create(disabledOptions),
_timeProvider,
NullLogger<InMemoryRetentionPolicyService>.Instance);
// Act
var result = await service.ExecuteRetentionAsync();
// Assert
Assert.False(result.Success);
Assert.Single(result.Errors);
Assert.Contains("disabled", result.Errors[0].Message, StringComparison.OrdinalIgnoreCase);
}
[Fact]
public async Task ExecuteRetentionAsync_ExecutesEnabledPolicies()
{
// Arrange
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Enabled Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30),
Enabled = true
});
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-002",
Name = "Disabled Policy",
DataType = RetentionDataType.Escalations,
RetentionPeriod = TimeSpan.FromDays(30),
Enabled = false
});
// Act
var result = await _service.ExecuteRetentionAsync();
// Assert
Assert.True(result.Success);
Assert.Single(result.PoliciesExecuted);
Assert.Contains("policy-001", result.PoliciesExecuted);
}
[Fact]
public async Task ExecuteRetentionAsync_SpecificPolicy_ExecutesOnlyThat()
{
// Arrange
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Policy 1",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
});
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-002",
Name = "Policy 2",
DataType = RetentionDataType.Escalations,
RetentionPeriod = TimeSpan.FromDays(30)
});
// Act
var result = await _service.ExecuteRetentionAsync("policy-002");
// Assert
Assert.Single(result.PoliciesExecuted);
Assert.Equal("policy-002", result.PoliciesExecuted[0]);
}
[Fact]
public async Task PreviewRetentionAsync_ReturnsPreview()
{
// Arrange
_service.RegisterHandler("DeliveryLogs", new TestRetentionHandler("DeliveryLogs", 100));
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Delivery Cleanup",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
});
// Act
var preview = await _service.PreviewRetentionAsync("policy-001");
// Assert
Assert.Equal("policy-001", preview.PolicyId);
Assert.Equal(100, preview.TotalAffected);
}
[Fact]
public async Task PreviewRetentionAsync_NotFound_Throws()
{
// Act & Assert
await Assert.ThrowsAsync<KeyNotFoundException>(() =>
_service.PreviewRetentionAsync("nonexistent"));
}
[Fact]
public async Task GetExecutionHistoryAsync_ReturnsHistory()
{
// Arrange
_service.RegisterHandler("DeliveryLogs", new TestRetentionHandler("DeliveryLogs", 50));
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
});
// Execute twice
await _service.ExecuteRetentionAsync("policy-001");
_timeProvider.Advance(TimeSpan.FromHours(1));
await _service.ExecuteRetentionAsync("policy-001");
// Act
var history = await _service.GetExecutionHistoryAsync("policy-001");
// Assert
Assert.Equal(2, history.Count);
Assert.All(history, r => Assert.True(r.Success));
}
[Fact]
public async Task GetNextExecutionAsync_ReturnsNextTime()
{
// Arrange
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Scheduled Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30),
Schedule = "0 0 * * *" // Daily at midnight
});
// Act
var next = await _service.GetNextExecutionAsync("policy-001");
// Assert
Assert.NotNull(next);
}
[Fact]
public async Task GetNextExecutionAsync_NoSchedule_ReturnsNull()
{
// Arrange
await _service.RegisterPolicyAsync(new RetentionPolicy
{
Id = "policy-001",
Name = "Unscheduled Policy",
DataType = RetentionDataType.DeliveryLogs,
RetentionPeriod = TimeSpan.FromDays(30)
// No schedule
});
// Act
var next = await _service.GetNextExecutionAsync("policy-001");
// Assert
Assert.Null(next);
}
[Fact]
public void CreateDeliveryLogPolicy_CreatesValidPolicy()
{
// Act
var policy = RetentionPolicyExtensions.CreateDeliveryLogPolicy(
"delivery-logs-cleanup",
TimeSpan.FromDays(30),
"tenant1",
"admin");
// Assert
Assert.Equal("delivery-logs-cleanup", policy.Id);
Assert.Equal(RetentionDataType.DeliveryLogs, policy.DataType);
Assert.Equal(TimeSpan.FromDays(30), policy.RetentionPeriod);
Assert.Equal("tenant1", policy.TenantId);
Assert.Equal("admin", policy.CreatedBy);
}
[Fact]
public void CreateAuditArchivePolicy_CreatesValidPolicy()
{
// Act
var policy = RetentionPolicyExtensions.CreateAuditArchivePolicy(
"audit-archive",
TimeSpan.FromDays(365),
"s3://bucket/archive",
"tenant1",
"admin");
// Assert
Assert.Equal("audit-archive", policy.Id);
Assert.Equal(RetentionDataType.AuditLogs, policy.DataType);
Assert.Equal(RetentionAction.Archive, policy.Action);
Assert.Equal("s3://bucket/archive", policy.ArchiveLocation);
}
private sealed class TestRetentionHandler : IRetentionHandler
{
public string DataType { get; }
private readonly long _count;
public TestRetentionHandler(string dataType, long count)
{
DataType = dataType;
_count = count;
}
public Task<long> CountAsync(RetentionQuery query, CancellationToken ct) => Task.FromResult(_count);
public Task<long> DeleteAsync(RetentionQuery query, CancellationToken ct) => Task.FromResult(_count);
public Task<long> ArchiveAsync(RetentionQuery query, string archiveLocation, CancellationToken ct) => Task.FromResult(_count);
}
}

View File

@@ -0,0 +1,371 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notifier.Worker.Security;
namespace StellaOps.Notifier.Tests.Security;
public class HtmlSanitizerTests
{
private readonly HtmlSanitizerOptions _options;
private readonly DefaultHtmlSanitizer _sanitizer;
public HtmlSanitizerTests()
{
_options = new HtmlSanitizerOptions
{
DefaultProfile = "basic",
LogSanitization = false
};
_sanitizer = new DefaultHtmlSanitizer(
Options.Create(_options),
NullLogger<DefaultHtmlSanitizer>.Instance);
}
[Fact]
public void Sanitize_AllowedTags_Preserved()
{
// Arrange
var html = "<p>Hello <strong>World</strong></p>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.Contains("<p>", result);
Assert.Contains("<strong>", result);
Assert.Contains("</strong>", result);
Assert.Contains("</p>", result);
}
[Fact]
public void Sanitize_DisallowedTags_Removed()
{
// Arrange
var html = "<p>Hello</p><iframe src='evil.com'></iframe>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.Contains("<p>Hello</p>", result);
Assert.DoesNotContain("<iframe", result);
}
[Fact]
public void Sanitize_ScriptTags_Removed()
{
// Arrange
var html = "<p>Hello</p><script>alert('xss')</script>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.Contains("<p>Hello</p>", result);
Assert.DoesNotContain("<script", result);
Assert.DoesNotContain("alert", result);
}
[Fact]
public void Sanitize_EventHandlers_Removed()
{
// Arrange
var html = "<p onclick='alert(1)'>Hello</p>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.DoesNotContain("onclick", result);
Assert.Contains("<p>Hello</p>", result);
}
[Fact]
public void Sanitize_JavaScriptUrls_Removed()
{
// Arrange
var html = "<a href='javascript:alert(1)'>Click</a>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.DoesNotContain("javascript:", result);
}
[Fact]
public void Sanitize_AllowedAttributes_Preserved()
{
// Arrange
var html = "<a href='https://example.com' title='Example'>Link</a>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.Contains("href=", result);
Assert.Contains("https://example.com", result);
Assert.Contains("title=", result);
}
[Fact]
public void Sanitize_DisallowedAttributes_Removed()
{
// Arrange
var html = "<p data-custom='value' class='test'>Hello</p>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.DoesNotContain("data-custom", result);
Assert.Contains("class=", result); // class is allowed
}
[Fact]
public void Sanitize_WithMinimalProfile_OnlyBasicTags()
{
// Arrange
var html = "<p><a href='https://example.com'>Link</a></p>";
var profile = SanitizationProfile.Minimal;
// Act
var result = _sanitizer.Sanitize(html, profile);
// Assert
Assert.Contains("<p>", result);
Assert.DoesNotContain("<a", result); // links not in minimal
}
[Fact]
public void Sanitize_WithRichProfile_AllowsImagesAndTables()
{
// Arrange
var html = "<table><tr><td>Cell</td></tr></table><img src='test.png' alt='Test'>";
var profile = SanitizationProfile.Rich;
// Act
var result = _sanitizer.Sanitize(html, profile);
// Assert
Assert.Contains("<table>", result);
Assert.Contains("<img", result);
Assert.Contains("src=", result);
}
[Fact]
public void Sanitize_HtmlComments_Removed()
{
// Arrange
var html = "<p>Hello<!-- comment --></p>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.DoesNotContain("<!--", result);
Assert.DoesNotContain("comment", result);
}
[Fact]
public void Sanitize_EmptyString_ReturnsEmpty()
{
// Act
var result = _sanitizer.Sanitize("");
// Assert
Assert.Empty(result);
}
[Fact]
public void Sanitize_NullString_ReturnsNull()
{
// Act
var result = _sanitizer.Sanitize(null!);
// Assert
Assert.Null(result);
}
[Fact]
public void Validate_SafeHtml_ReturnsValid()
{
// Arrange
var html = "<p>Hello <strong>World</strong></p>";
// Act
var result = _sanitizer.Validate(html);
// Assert
Assert.True(result.IsValid);
Assert.Empty(result.Errors);
}
[Fact]
public void Validate_ScriptTag_ReturnsErrors()
{
// Arrange
var html = "<script>alert('xss')</script>";
// Act
var result = _sanitizer.Validate(html);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Type == HtmlValidationErrorType.ScriptDetected);
Assert.True(result.ContainedDangerousContent);
}
[Fact]
public void Validate_EventHandler_ReturnsErrors()
{
// Arrange
var html = "<p onclick='alert(1)'>Hello</p>";
// Act
var result = _sanitizer.Validate(html);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Type == HtmlValidationErrorType.EventHandlerDetected);
}
[Fact]
public void Validate_JavaScriptUrl_ReturnsErrors()
{
// Arrange
var html = "<a href='javascript:void(0)'>Click</a>";
// Act
var result = _sanitizer.Validate(html);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Type == HtmlValidationErrorType.JavaScriptUrlDetected);
}
[Fact]
public void Validate_DisallowedTags_ReturnsWarnings()
{
// Arrange
var html = "<p>Hello</p><custom-tag>Custom</custom-tag>";
// Act
var result = _sanitizer.Validate(html);
// Assert
Assert.Contains(result.RemovedTags, t => t == "custom-tag");
}
[Fact]
public void EscapeHtml_EscapesSpecialCharacters()
{
// Arrange
var text = "<script>alert('test')</script>";
// Act
var result = _sanitizer.EscapeHtml(text);
// Assert
Assert.DoesNotContain("<", result);
Assert.DoesNotContain(">", result);
Assert.Contains("&lt;", result);
Assert.Contains("&gt;", result);
}
[Fact]
public void StripTags_RemovesAllTags()
{
// Arrange
var html = "<p>Hello <strong>World</strong></p>";
// Act
var result = _sanitizer.StripTags(html);
// Assert
Assert.DoesNotContain("<", result);
Assert.DoesNotContain(">", result);
Assert.Contains("Hello", result);
Assert.Contains("World", result);
}
[Fact]
public void GetProfile_ExistingProfile_ReturnsProfile()
{
// Act
var profile = _sanitizer.GetProfile("basic");
// Assert
Assert.NotNull(profile);
Assert.Equal("basic", profile.Name);
}
[Fact]
public void GetProfile_NonExistentProfile_ReturnsNull()
{
// Act
var profile = _sanitizer.GetProfile("non-existent");
// Assert
Assert.Null(profile);
}
[Fact]
public void RegisterProfile_AddsCustomProfile()
{
// Arrange
var customProfile = new SanitizationProfile
{
Name = "custom",
AllowedTags = new HashSet<string> { "p", "custom-tag" }
};
// Act
_sanitizer.RegisterProfile("custom", customProfile);
var retrieved = _sanitizer.GetProfile("custom");
// Assert
Assert.NotNull(retrieved);
Assert.Equal("custom", retrieved.Name);
}
[Theory]
[InlineData("<p>Test</p>", "<p>Test</p>")]
[InlineData("<P>Test</P>", "<p>Test</p>")]
[InlineData("<DIV>Test</DIV>", "<div>Test</div>")]
public void Sanitize_NormalizesTagCase(string input, string expected)
{
// Act
var result = _sanitizer.Sanitize(input);
// Assert
Assert.Equal(expected, result.Trim());
}
[Fact]
public void Sanitize_SafeUrlSchemes_Preserved()
{
// Arrange
var html = "<a href='mailto:test@example.com'>Email</a>";
// Act
var result = _sanitizer.Sanitize(html);
// Assert
Assert.Contains("mailto:", result);
}
[Fact]
public void Sanitize_DataUrl_RemovedByDefault()
{
// Arrange
var html = "<img src='data:image/png;base64,abc123' />";
var profile = SanitizationProfile.Rich with { AllowDataUrls = false };
// Act
var result = _sanitizer.Sanitize(html, profile);
// Assert
Assert.DoesNotContain("data:", result);
}
}

View File

@@ -0,0 +1,349 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Security;
namespace StellaOps.Notifier.Tests.Security;
public class SigningServiceTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly SigningServiceOptions _options;
private readonly LocalSigningKeyProvider _keyProvider;
private readonly SigningService _signingService;
public SigningServiceTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new SigningServiceOptions
{
LocalSigningKey = "test-signing-key-for-unit-tests",
Algorithm = "HMACSHA256",
DefaultExpiry = TimeSpan.FromHours(24)
};
_keyProvider = new LocalSigningKeyProvider(
Options.Create(_options),
_timeProvider);
_signingService = new SigningService(
_keyProvider,
Options.Create(_options),
_timeProvider,
NullLogger<SigningService>.Instance);
}
[Fact]
public async Task SignAsync_CreatesValidToken()
{
// Arrange
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(24)
};
// Act
var token = await _signingService.SignAsync(payload);
// Assert
Assert.NotNull(token);
Assert.Contains(".", token);
var parts = token.Split('.');
Assert.Equal(3, parts.Length); // header.body.signature
}
[Fact]
public async Task VerifyAsync_ValidToken_ReturnsValid()
{
// Arrange
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(24)
};
var token = await _signingService.SignAsync(payload);
// Act
var result = await _signingService.VerifyAsync(token);
// Assert
Assert.True(result.IsValid);
Assert.NotNull(result.Payload);
Assert.Equal(payload.TokenId, result.Payload.TokenId);
Assert.Equal(payload.Purpose, result.Payload.Purpose);
Assert.Equal(payload.TenantId, result.Payload.TenantId);
Assert.Equal(payload.Subject, result.Payload.Subject);
}
[Fact]
public async Task VerifyAsync_ExpiredToken_ReturnsExpired()
{
// Arrange
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(1)
};
var token = await _signingService.SignAsync(payload);
// Advance time past expiry
_timeProvider.Advance(TimeSpan.FromHours(2));
// Act
var result = await _signingService.VerifyAsync(token);
// Assert
Assert.False(result.IsValid);
Assert.Equal(SigningErrorCode.Expired, result.ErrorCode);
}
[Fact]
public async Task VerifyAsync_TamperedToken_ReturnsInvalidSignature()
{
// Arrange
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(24)
};
var token = await _signingService.SignAsync(payload);
// Tamper with the token
var parts = token.Split('.');
var tamperedToken = $"{parts[0]}.{parts[1]}.tampered_signature";
// Act
var result = await _signingService.VerifyAsync(tamperedToken);
// Assert
Assert.False(result.IsValid);
Assert.Equal(SigningErrorCode.InvalidSignature, result.ErrorCode);
}
[Fact]
public async Task VerifyAsync_MalformedToken_ReturnsInvalidFormat()
{
// Act
var result = await _signingService.VerifyAsync("not-a-valid-token");
// Assert
Assert.False(result.IsValid);
Assert.Equal(SigningErrorCode.InvalidFormat, result.ErrorCode);
}
[Fact]
public async Task GetTokenInfo_ValidToken_ReturnsInfo()
{
// Arrange
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(24)
};
var token = await _signingService.SignAsync(payload);
// Act
var info = _signingService.GetTokenInfo(token);
// Assert
Assert.NotNull(info);
Assert.Equal(payload.TokenId, info.TokenId);
Assert.Equal(payload.Purpose, info.Purpose);
Assert.Equal(payload.TenantId, info.TenantId);
}
[Fact]
public void GetTokenInfo_MalformedToken_ReturnsNull()
{
// Act
var info = _signingService.GetTokenInfo("not-a-valid-token");
// Assert
Assert.Null(info);
}
[Fact]
public async Task RotateKeyAsync_CreatesNewKey()
{
// Arrange
var keysBefore = await _keyProvider.ListKeyIdsAsync();
// Act
var success = await _signingService.RotateKeyAsync();
// Assert
Assert.True(success);
var keysAfter = await _keyProvider.ListKeyIdsAsync();
Assert.True(keysAfter.Count > keysBefore.Count);
}
[Fact]
public async Task SignAsync_WithClaims_PreservesClaims()
{
// Arrange
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(24),
Claims = new Dictionary<string, string>
{
["custom1"] = "value1",
["custom2"] = "value2"
}
};
// Act
var token = await _signingService.SignAsync(payload);
var result = await _signingService.VerifyAsync(token);
// Assert
Assert.True(result.IsValid);
Assert.NotNull(result.Payload);
Assert.Equal(2, result.Payload.Claims.Count);
Assert.Equal("value1", result.Payload.Claims["custom1"]);
Assert.Equal("value2", result.Payload.Claims["custom2"]);
}
[Fact]
public async Task VerifyAsync_AfterKeyRotation_StillVerifiesOldTokens()
{
// Arrange - sign with old key
var payload = new SigningPayload
{
TokenId = "token-001",
Purpose = "incident.ack",
TenantId = "tenant1",
Subject = "incident-123",
ExpiresAt = _timeProvider.GetUtcNow().AddHours(24)
};
var token = await _signingService.SignAsync(payload);
// Rotate key
await _signingService.RotateKeyAsync();
// Act - verify with new current key (but old key should still be available)
var result = await _signingService.VerifyAsync(token);
// Assert
Assert.True(result.IsValid);
}
}
public class LocalSigningKeyProviderTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly SigningServiceOptions _options;
private readonly LocalSigningKeyProvider _keyProvider;
public LocalSigningKeyProviderTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new SigningServiceOptions
{
LocalSigningKey = "test-key",
KeyRetentionPeriod = TimeSpan.FromDays(90)
};
_keyProvider = new LocalSigningKeyProvider(
Options.Create(_options),
_timeProvider);
}
[Fact]
public async Task GetCurrentKeyAsync_ReturnsKey()
{
// Act
var key = await _keyProvider.GetCurrentKeyAsync();
// Assert
Assert.NotNull(key);
Assert.True(key.IsCurrent);
Assert.NotEmpty(key.KeyMaterial);
}
[Fact]
public async Task GetKeyByIdAsync_ExistingKey_ReturnsKey()
{
// Arrange
var currentKey = await _keyProvider.GetCurrentKeyAsync();
// Act
var key = await _keyProvider.GetKeyByIdAsync(currentKey.KeyId);
// Assert
Assert.NotNull(key);
Assert.Equal(currentKey.KeyId, key.KeyId);
}
[Fact]
public async Task GetKeyByIdAsync_NonExistentKey_ReturnsNull()
{
// Act
var key = await _keyProvider.GetKeyByIdAsync("non-existent-key");
// Assert
Assert.Null(key);
}
[Fact]
public async Task RotateAsync_CreatesNewCurrentKey()
{
// Arrange
var oldKey = await _keyProvider.GetCurrentKeyAsync();
// Act
var newKey = await _keyProvider.RotateAsync();
// Assert
Assert.NotEqual(oldKey.KeyId, newKey.KeyId);
Assert.True(newKey.IsCurrent);
var currentKey = await _keyProvider.GetCurrentKeyAsync();
Assert.Equal(newKey.KeyId, currentKey.KeyId);
}
[Fact]
public async Task RotateAsync_KeepsOldKeyForVerification()
{
// Arrange
var oldKey = await _keyProvider.GetCurrentKeyAsync();
// Act
await _keyProvider.RotateAsync();
// Assert - old key should still be retrievable
var retrievedOldKey = await _keyProvider.GetKeyByIdAsync(oldKey.KeyId);
Assert.NotNull(retrievedOldKey);
Assert.False(retrievedOldKey.IsCurrent);
}
[Fact]
public async Task ListKeyIdsAsync_ReturnsAllKeys()
{
// Arrange
await _keyProvider.RotateAsync();
await _keyProvider.RotateAsync();
// Act
var keyIds = await _keyProvider.ListKeyIdsAsync();
// Assert
Assert.Equal(3, keyIds.Count); // Initial + 2 rotations
}
}

View File

@@ -0,0 +1,392 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Security;
namespace StellaOps.Notifier.Tests.Security;
public class TenantIsolationValidatorTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly TenantIsolationOptions _options;
private readonly InMemoryTenantIsolationValidator _validator;
public TenantIsolationValidatorTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new TenantIsolationOptions
{
EnforceStrict = true,
LogViolations = false,
RecordViolations = true,
AllowCrossTenantGrants = true,
SystemResourceTypes = ["system-template"],
AdminTenantPatterns = ["^admin$", "^system$"]
};
_validator = new InMemoryTenantIsolationValidator(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryTenantIsolationValidator>.Instance);
}
[Fact]
public async Task ValidateResourceAccessAsync_SameTenant_Allowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Act
var result = await _validator.ValidateResourceAccessAsync(
"tenant1", "delivery", "delivery-001", TenantAccessOperation.Read);
// Assert
Assert.True(result.IsAllowed);
Assert.Equal(TenantValidationType.SameTenant, result.ValidationType);
}
[Fact]
public async Task ValidateResourceAccessAsync_DifferentTenant_Denied()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Act
var result = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
// Assert
Assert.False(result.IsAllowed);
Assert.Equal(TenantValidationType.Denied, result.ValidationType);
}
[Fact]
public async Task ValidateResourceAccessAsync_AdminTenant_AlwaysAllowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Act
var result = await _validator.ValidateResourceAccessAsync(
"admin", "delivery", "delivery-001", TenantAccessOperation.Read);
// Assert
Assert.True(result.IsAllowed);
Assert.Equal(TenantValidationType.SystemResource, result.ValidationType);
}
[Fact]
public async Task ValidateResourceAccessAsync_SystemResource_AlwaysAllowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "system-template", "template-001");
// Act
var result = await _validator.ValidateResourceAccessAsync(
"tenant2", "system-template", "template-001", TenantAccessOperation.Read);
// Assert
Assert.True(result.IsAllowed);
Assert.Equal(TenantValidationType.SystemResource, result.ValidationType);
}
[Fact]
public async Task ValidateResourceAccessAsync_UnregisteredResource_Allowed()
{
// Act - resource not registered
var result = await _validator.ValidateResourceAccessAsync(
"tenant1", "delivery", "unregistered-delivery", TenantAccessOperation.Read);
// Assert
Assert.True(result.IsAllowed);
Assert.Equal(TenantValidationType.ResourceNotFound, result.ValidationType);
}
[Fact]
public async Task ValidateDeliveryAsync_SameTenant_Allowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Act
var result = await _validator.ValidateDeliveryAsync("tenant1", "delivery-001");
// Assert
Assert.True(result.IsAllowed);
}
[Fact]
public async Task ValidateChannelAsync_SameTenant_Allowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "channel", "channel-001");
// Act
var result = await _validator.ValidateChannelAsync("tenant1", "channel-001");
// Assert
Assert.True(result.IsAllowed);
}
[Fact]
public async Task ValidateTemplateAsync_SameTenant_Allowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "template", "template-001");
// Act
var result = await _validator.ValidateTemplateAsync("tenant1", "template-001");
// Assert
Assert.True(result.IsAllowed);
}
[Fact]
public async Task ValidateSubscriptionAsync_SameTenant_Allowed()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "subscription", "subscription-001");
// Act
var result = await _validator.ValidateSubscriptionAsync("tenant1", "subscription-001");
// Assert
Assert.True(result.IsAllowed);
}
[Fact]
public async Task GrantCrossTenantAccessAsync_EnablesAccess()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Act
await _validator.GrantCrossTenantAccessAsync(
"tenant1", "tenant2", "delivery", "delivery-001",
TenantAccessOperation.Read, null, "admin");
var result = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
// Assert
Assert.True(result.IsAllowed);
Assert.True(result.IsCrossTenant);
Assert.NotNull(result.GrantId);
}
[Fact]
public async Task RevokeCrossTenantAccessAsync_DisablesAccess()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
await _validator.GrantCrossTenantAccessAsync(
"tenant1", "tenant2", "delivery", "delivery-001",
TenantAccessOperation.Read, null, "admin");
// Act
await _validator.RevokeCrossTenantAccessAsync(
"tenant1", "tenant2", "delivery", "delivery-001", "admin");
var result = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
// Assert
Assert.False(result.IsAllowed);
}
[Fact]
public async Task GrantCrossTenantAccessAsync_WithExpiry_ExpiresCorrectly()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
var expiresAt = _timeProvider.GetUtcNow().AddHours(1);
await _validator.GrantCrossTenantAccessAsync(
"tenant1", "tenant2", "delivery", "delivery-001",
TenantAccessOperation.Read, expiresAt, "admin");
// Verify access before expiry
var resultBefore = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
Assert.True(resultBefore.IsAllowed);
// Advance time past expiry
_timeProvider.Advance(TimeSpan.FromHours(2));
// Act
var resultAfter = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
// Assert
Assert.False(resultAfter.IsAllowed);
}
[Fact]
public async Task GrantCrossTenantAccessAsync_OperationRestrictions_Enforced()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
await _validator.GrantCrossTenantAccessAsync(
"tenant1", "tenant2", "delivery", "delivery-001",
TenantAccessOperation.Read, null, "admin");
// Act - Read should be allowed
var readResult = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
// Write should be denied (not in granted operations)
var writeResult = await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Write);
// Assert
Assert.True(readResult.IsAllowed);
Assert.False(writeResult.IsAllowed);
}
[Fact]
public async Task GetViolationsAsync_ReturnsRecordedViolations()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Trigger a violation
await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
// Act
var violations = await _validator.GetViolationsAsync("tenant2");
// Assert
Assert.Single(violations);
Assert.Equal("tenant2", violations[0].RequestingTenantId);
Assert.Equal("tenant1", violations[0].ResourceOwnerTenantId);
}
[Fact]
public async Task GetViolationsAsync_FiltersBySince()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
_timeProvider.Advance(TimeSpan.FromHours(2));
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-002");
await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-002", TenantAccessOperation.Read);
// Act
var since = _timeProvider.GetUtcNow().AddHours(-1);
var violations = await _validator.GetViolationsAsync(null, since);
// Assert
Assert.Single(violations);
Assert.Equal("delivery-002", violations[0].ResourceId);
}
[Fact]
public async Task RegisterResourceAsync_AddsResource()
{
// Act
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
var resources = await _validator.GetTenantResourcesAsync("tenant1");
// Assert
Assert.Single(resources);
Assert.Equal("tenant1", resources[0].TenantId);
Assert.Equal("delivery", resources[0].ResourceType);
Assert.Equal("delivery-001", resources[0].ResourceId);
}
[Fact]
public async Task UnregisterResourceAsync_RemovesResource()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Act
await _validator.UnregisterResourceAsync("delivery", "delivery-001");
var resources = await _validator.GetTenantResourcesAsync("tenant1");
// Assert
Assert.Empty(resources);
}
[Fact]
public async Task GetTenantResourcesAsync_FiltersByResourceType()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
await _validator.RegisterResourceAsync("tenant1", "channel", "channel-001");
// Act
var deliveries = await _validator.GetTenantResourcesAsync("tenant1", "delivery");
// Assert
Assert.Single(deliveries);
Assert.Equal("delivery", deliveries[0].ResourceType);
}
[Fact]
public async Task RunFuzzTestAsync_AllTestsPass()
{
// Arrange
var config = new TenantFuzzTestConfig
{
Iterations = 20,
TenantIds = ["tenant-a", "tenant-b"],
ResourceTypes = ["delivery", "channel"],
TestCrossTenantGrants = true,
TestEdgeCases = true,
Seed = 42 // For reproducibility
};
// Act
var result = await _validator.RunFuzzTestAsync(config);
// Assert
Assert.True(result.AllPassed);
Assert.True(result.TotalTests > 0);
Assert.Equal(result.TotalTests, result.PassedTests);
Assert.Empty(result.Failures);
}
[Fact]
public async Task ValidateCrossTenantAccessAsync_SameTenant_Allowed()
{
// Act
var result = await _validator.ValidateCrossTenantAccessAsync(
"tenant1", "tenant1", "delivery", "delivery-001");
// Assert
Assert.True(result.IsAllowed);
Assert.Equal(TenantValidationType.SameTenant, result.ValidationType);
}
[Fact]
public async Task ViolationSeverity_ReflectsOperation()
{
// Arrange
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-001");
// Trigger different violations
await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-001", TenantAccessOperation.Read);
await _validator.RegisterResourceAsync("tenant1", "delivery", "delivery-002");
await _validator.ValidateResourceAccessAsync(
"tenant2", "delivery", "delivery-002", TenantAccessOperation.Delete);
// Act
var violations = await _validator.GetViolationsAsync("tenant2");
// Assert
var readViolation = violations.FirstOrDefault(v => v.ResourceId == "delivery-001");
var deleteViolation = violations.FirstOrDefault(v => v.ResourceId == "delivery-002");
Assert.NotNull(readViolation);
Assert.NotNull(deleteViolation);
Assert.Equal(ViolationSeverity.Low, readViolation.Severity);
Assert.Equal(ViolationSeverity.Critical, deleteViolation.Severity);
}
}

View File

@@ -0,0 +1,368 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Security;
namespace StellaOps.Notifier.Tests.Security;
public class WebhookSecurityServiceTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly WebhookSecurityOptions _options;
private readonly InMemoryWebhookSecurityService _webhookService;
public WebhookSecurityServiceTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new WebhookSecurityOptions
{
DefaultAlgorithm = "SHA256",
EnableReplayProtection = true,
NonceCacheExpiry = TimeSpan.FromMinutes(10)
};
_webhookService = new InMemoryWebhookSecurityService(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryWebhookSecurityService>.Instance);
}
[Fact]
public async Task ValidateAsync_NoConfig_ReturnsValidWithWarning()
{
// Arrange
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = "{\"test\": \"data\"}"
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.True(result.IsValid);
Assert.Contains(result.Warnings, w => w.Contains("No webhook security configuration"));
}
[Fact]
public async Task ValidateAsync_ValidSignature_ReturnsValid()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
Algorithm = "SHA256",
RequireSignature = true
};
await _webhookService.RegisterWebhookAsync(config);
var body = "{\"test\": \"data\"}";
var signature = _webhookService.GenerateSignature(body, config.SecretKey);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = body,
Signature = signature
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.True(result.IsValid);
Assert.True(result.PassedChecks.HasFlag(WebhookValidationChecks.SignatureValid));
}
[Fact]
public async Task ValidateAsync_InvalidSignature_ReturnsDenied()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
RequireSignature = true
};
await _webhookService.RegisterWebhookAsync(config);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = "{\"test\": \"data\"}",
Signature = "invalid-signature"
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.False(result.IsValid);
Assert.True(result.FailedChecks.HasFlag(WebhookValidationChecks.SignatureValid));
}
[Fact]
public async Task ValidateAsync_MissingSignature_ReturnsDenied()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
RequireSignature = true
};
await _webhookService.RegisterWebhookAsync(config);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = "{\"test\": \"data\"}"
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Contains("Missing signature"));
}
[Fact]
public async Task ValidateAsync_IpNotInAllowlist_ReturnsDenied()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
RequireSignature = false,
EnforceIpAllowlist = true,
AllowedIps = ["192.168.1.0/24"]
};
await _webhookService.RegisterWebhookAsync(config);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = "{\"test\": \"data\"}",
SourceIp = "10.0.0.1"
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.False(result.IsValid);
Assert.True(result.FailedChecks.HasFlag(WebhookValidationChecks.IpAllowed));
}
[Fact]
public async Task ValidateAsync_IpInAllowlist_ReturnsValid()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
RequireSignature = false,
EnforceIpAllowlist = true,
AllowedIps = ["192.168.1.0/24"]
};
await _webhookService.RegisterWebhookAsync(config);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = "{\"test\": \"data\"}",
SourceIp = "192.168.1.100"
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.True(result.IsValid);
Assert.True(result.PassedChecks.HasFlag(WebhookValidationChecks.IpAllowed));
}
[Fact]
public async Task ValidateAsync_ExpiredTimestamp_ReturnsDenied()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
RequireSignature = false,
MaxRequestAge = TimeSpan.FromMinutes(5)
};
await _webhookService.RegisterWebhookAsync(config);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = "{\"test\": \"data\"}",
Timestamp = _timeProvider.GetUtcNow().AddMinutes(-10)
};
// Act
var result = await _webhookService.ValidateAsync(request);
// Assert
Assert.False(result.IsValid);
Assert.True(result.FailedChecks.HasFlag(WebhookValidationChecks.NotExpired));
}
[Fact]
public async Task ValidateAsync_ReplayAttack_ReturnsDenied()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
RequireSignature = true
};
await _webhookService.RegisterWebhookAsync(config);
var body = "{\"test\": \"data\"}";
var signature = _webhookService.GenerateSignature(body, config.SecretKey);
var request = new WebhookValidationRequest
{
TenantId = "tenant1",
ChannelId = "channel1",
Body = body,
Signature = signature,
Timestamp = _timeProvider.GetUtcNow()
};
// First request should succeed
var result1 = await _webhookService.ValidateAsync(request);
Assert.True(result1.IsValid);
// Act - second request with same signature should fail
var result2 = await _webhookService.ValidateAsync(request);
// Assert
Assert.False(result2.IsValid);
Assert.True(result2.FailedChecks.HasFlag(WebhookValidationChecks.NotReplay));
}
[Fact]
public void GenerateSignature_ProducesConsistentOutput()
{
// Arrange
var payload = "{\"test\": \"data\"}";
var secretKey = "test-secret";
// Act
var sig1 = _webhookService.GenerateSignature(payload, secretKey);
var sig2 = _webhookService.GenerateSignature(payload, secretKey);
// Assert
Assert.Equal(sig1, sig2);
}
[Fact]
public async Task UpdateAllowlistAsync_UpdatesConfig()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
EnforceIpAllowlist = true,
AllowedIps = ["192.168.1.0/24"]
};
await _webhookService.RegisterWebhookAsync(config);
// Act
await _webhookService.UpdateAllowlistAsync(
"tenant1", "channel1", ["10.0.0.0/8"], "admin");
// Assert
var updatedConfig = await _webhookService.GetConfigAsync("tenant1", "channel1");
Assert.NotNull(updatedConfig);
Assert.Single(updatedConfig.AllowedIps);
Assert.Equal("10.0.0.0/8", updatedConfig.AllowedIps[0]);
}
[Fact]
public async Task IsIpAllowedAsync_NoConfig_ReturnsTrue()
{
// Act
var allowed = await _webhookService.IsIpAllowedAsync("tenant1", "channel1", "192.168.1.1");
// Assert
Assert.True(allowed);
}
[Fact]
public async Task IsIpAllowedAsync_CidrMatch_ReturnsTrue()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
EnforceIpAllowlist = true,
AllowedIps = ["192.168.1.0/24"]
};
await _webhookService.RegisterWebhookAsync(config);
// Act
var allowed = await _webhookService.IsIpAllowedAsync("tenant1", "channel1", "192.168.1.50");
// Assert
Assert.True(allowed);
}
[Fact]
public async Task IsIpAllowedAsync_ExactMatch_ReturnsTrue()
{
// Arrange
var config = new WebhookSecurityConfig
{
ConfigId = "config-001",
TenantId = "tenant1",
ChannelId = "channel1",
SecretKey = "test-secret-key",
EnforceIpAllowlist = true,
AllowedIps = ["192.168.1.100"]
};
await _webhookService.RegisterWebhookAsync(config);
// Act
var allowed = await _webhookService.IsIpAllowedAsync("tenant1", "channel1", "192.168.1.100");
// Assert
Assert.True(allowed);
}
}

View File

@@ -0,0 +1,439 @@
using System.Collections.Immutable;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notify.Engine;
using StellaOps.Notify.Models;
using StellaOps.Notify.Storage.Mongo.Repositories;
using StellaOps.Notifier.Worker.Simulation;
namespace StellaOps.Notifier.Tests.Simulation;
public class SimulationEngineTests
{
private readonly Mock<INotifyRuleRepository> _ruleRepository;
private readonly Mock<INotifyRuleEvaluator> _ruleEvaluator;
private readonly Mock<INotifyChannelRepository> _channelRepository;
private readonly FakeTimeProvider _timeProvider;
private readonly SimulationOptions _options;
private readonly SimulationEngine _engine;
public SimulationEngineTests()
{
_ruleRepository = new Mock<INotifyRuleRepository>();
_ruleEvaluator = new Mock<INotifyRuleEvaluator>();
_channelRepository = new Mock<INotifyChannelRepository>();
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 10, 0, 0, TimeSpan.Zero));
_options = new SimulationOptions();
_engine = new SimulationEngine(
_ruleRepository.Object,
_ruleEvaluator.Object,
_channelRepository.Object,
Options.Create(_options),
_timeProvider,
NullLogger<SimulationEngine>.Instance);
}
[Fact]
public async Task SimulateAsync_WithMatchingRules_ReturnsMatchedResults()
{
// Arrange
var rules = new List<NotifyRule> { CreateTestRule("rule-1") };
var events = new List<NotifyEvent> { CreateTestEvent("event.test") };
var channel = CreateTestChannel("channel-1");
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(rules);
_channelRepository
.Setup(c => c.GetAsync(It.IsAny<string>(), "channel-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(channel);
_ruleEvaluator
.Setup(e => e.Evaluate(It.IsAny<NotifyRule>(), It.IsAny<NotifyEvent>(), It.IsAny<DateTimeOffset?>()))
.Returns((NotifyRule r, NotifyEvent _, DateTimeOffset? ts) =>
NotifyRuleEvaluationOutcome.Matched(r, r.Actions, ts ?? _timeProvider.GetUtcNow()));
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = events
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.NotNull(result);
Assert.StartsWith("sim-", result.SimulationId);
Assert.Equal(1, result.TotalEvents);
Assert.Equal(1, result.TotalRules);
Assert.Equal(1, result.MatchedEvents);
Assert.True(result.TotalActionsTriggered > 0);
}
[Fact]
public async Task SimulateAsync_WithNoMatchingRules_ReturnsNoMatches()
{
// Arrange
var rules = new List<NotifyRule> { CreateTestRule("rule-1") };
var events = new List<NotifyEvent> { CreateTestEvent("event.test") };
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(rules);
_ruleEvaluator
.Setup(e => e.Evaluate(It.IsAny<NotifyRule>(), It.IsAny<NotifyEvent>(), It.IsAny<DateTimeOffset?>()))
.Returns((NotifyRule r, NotifyEvent _, DateTimeOffset? _) =>
NotifyRuleEvaluationOutcome.NotMatched(r, "event_kind_mismatch"));
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = events
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.NotNull(result);
Assert.Equal(0, result.MatchedEvents);
Assert.Equal(0, result.TotalActionsTriggered);
}
[Fact]
public async Task SimulateAsync_WithIncludeNonMatches_ReturnsNonMatchReasons()
{
// Arrange
var rules = new List<NotifyRule> { CreateTestRule("rule-1") };
var events = new List<NotifyEvent> { CreateTestEvent("event.test") };
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(rules);
_ruleEvaluator
.Setup(e => e.Evaluate(It.IsAny<NotifyRule>(), It.IsAny<NotifyEvent>(), It.IsAny<DateTimeOffset?>()))
.Returns((NotifyRule r, NotifyEvent _, DateTimeOffset? _) =>
NotifyRuleEvaluationOutcome.NotMatched(r, "severity_below_threshold"));
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = events,
IncludeNonMatches = true
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.NotNull(result);
Assert.Single(result.EventResults);
Assert.NotNull(result.EventResults[0].NonMatchedRules);
Assert.Single(result.EventResults[0].NonMatchedRules);
Assert.Equal("severity_below_threshold", result.EventResults[0].NonMatchedRules[0].Reason);
}
[Fact]
public async Task SimulateAsync_WithProvidedRules_UsesProvidedRules()
{
// Arrange
var providedRules = new List<NotifyRule>
{
CreateTestRule("custom-rule-1"),
CreateTestRule("custom-rule-2")
};
var events = new List<NotifyEvent> { CreateTestEvent("event.test") };
_ruleEvaluator
.Setup(e => e.Evaluate(It.IsAny<NotifyRule>(), It.IsAny<NotifyEvent>(), It.IsAny<DateTimeOffset?>()))
.Returns((NotifyRule r, NotifyEvent _, DateTimeOffset? _) =>
NotifyRuleEvaluationOutcome.NotMatched(r, "event_kind_mismatch"));
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = events,
Rules = providedRules
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.Equal(2, result.TotalRules);
_ruleRepository.Verify(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()), Times.Never);
}
[Fact]
public async Task SimulateAsync_WithNoEvents_ReturnsEmptyResult()
{
// Arrange
var rules = new List<NotifyRule> { CreateTestRule("rule-1") };
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(rules);
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = new List<NotifyEvent>()
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.NotNull(result);
Assert.Equal(0, result.TotalEvents);
Assert.Empty(result.EventResults);
}
[Fact]
public async Task SimulateAsync_WithNoRules_ReturnsEmptyResult()
{
// Arrange
var events = new List<NotifyEvent> { CreateTestEvent("event.test") };
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(new List<NotifyRule>());
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = events
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.NotNull(result);
Assert.Equal(0, result.TotalRules);
}
[Fact]
public async Task SimulateAsync_BuildsRuleSummaries()
{
// Arrange
var rules = new List<NotifyRule>
{
CreateTestRule("rule-1"),
CreateTestRule("rule-2")
};
var events = new List<NotifyEvent>
{
CreateTestEvent("event.test"),
CreateTestEvent("event.test"),
CreateTestEvent("event.test")
};
var channel = CreateTestChannel("channel-1");
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(rules);
_channelRepository
.Setup(c => c.GetAsync(It.IsAny<string>(), "channel-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(channel);
var callCount = 0;
_ruleEvaluator
.Setup(e => e.Evaluate(It.IsAny<NotifyRule>(), It.IsAny<NotifyEvent>(), It.IsAny<DateTimeOffset?>()))
.Returns((NotifyRule r, NotifyEvent _, DateTimeOffset? ts) =>
{
callCount++;
// First rule matches all, second rule matches none
if (r.RuleId == "rule-1")
return NotifyRuleEvaluationOutcome.Matched(r, r.Actions, ts ?? _timeProvider.GetUtcNow());
return NotifyRuleEvaluationOutcome.NotMatched(r, "event_kind_mismatch");
});
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = events
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.Equal(2, result.RuleSummaries.Count);
var rule1Summary = result.RuleSummaries.First(s => s.RuleId == "rule-1");
Assert.Equal(3, rule1Summary.MatchCount);
Assert.Equal(100.0, rule1Summary.MatchPercentage);
var rule2Summary = result.RuleSummaries.First(s => s.RuleId == "rule-2");
Assert.Equal(0, rule2Summary.MatchCount);
Assert.Equal(0.0, rule2Summary.MatchPercentage);
}
[Fact]
public async Task SimulateAsync_FiltersDisabledRulesWhenEnabledRulesOnly()
{
// Arrange
var rules = new List<NotifyRule>
{
CreateTestRule("rule-1", enabled: true),
CreateTestRule("rule-2", enabled: false)
};
_ruleRepository
.Setup(r => r.ListAsync(It.IsAny<string>(), It.IsAny<CancellationToken>()))
.ReturnsAsync(rules);
_ruleEvaluator
.Setup(e => e.Evaluate(It.IsAny<NotifyRule>(), It.IsAny<NotifyEvent>(), It.IsAny<DateTimeOffset?>()))
.Returns((NotifyRule r, NotifyEvent _, DateTimeOffset? _) =>
NotifyRuleEvaluationOutcome.NotMatched(r, "test"));
var request = new SimulationRequest
{
TenantId = "tenant1",
Events = new List<NotifyEvent> { CreateTestEvent("event.test") },
EnabledRulesOnly = true
};
// Act
var result = await _engine.SimulateAsync(request);
// Assert
Assert.Equal(1, result.TotalRules);
}
[Fact]
public async Task ValidateRuleAsync_ValidRule_ReturnsValid()
{
// Arrange
var rule = CreateTestRule("valid-rule");
// Act
var result = await _engine.ValidateRuleAsync(rule);
// Assert
Assert.True(result.IsValid);
Assert.Empty(result.Errors);
}
[Fact]
public async Task ValidateRuleAsync_BroadMatchRule_ReturnsWarning()
{
// Arrange
var rule = NotifyRule.Create(
ruleId: "broad-rule",
tenantId: "tenant1",
name: "Broad Rule",
match: NotifyRuleMatch.Create(),
actions: new[] { NotifyRuleAction.Create("action-1", "channel-1") },
enabled: true);
// Act
var result = await _engine.ValidateRuleAsync(rule);
// Assert
Assert.True(result.IsValid);
Assert.Contains(result.Warnings, w => w.Code == "broad_match");
}
[Fact]
public async Task ValidateRuleAsync_DisabledRule_ReturnsWarning()
{
// Arrange
var rule = CreateTestRule("disabled-rule", enabled: false);
// Act
var result = await _engine.ValidateRuleAsync(rule);
// Assert
Assert.True(result.IsValid);
Assert.Contains(result.Warnings, w => w.Code == "rule_disabled");
}
[Fact]
public async Task ValidateRuleAsync_UnknownSeverity_ReturnsWarning()
{
// Arrange
var rule = NotifyRule.Create(
ruleId: "bad-severity-rule",
tenantId: "tenant1",
name: "Bad Severity Rule",
match: NotifyRuleMatch.Create(minSeverity: "mega-critical"),
actions: new[] { NotifyRuleAction.Create("action-1", "channel-1") },
enabled: true);
// Act
var result = await _engine.ValidateRuleAsync(rule);
// Assert
Assert.True(result.IsValid);
Assert.Contains(result.Warnings, w => w.Code == "unknown_severity");
}
[Fact]
public async Task ValidateRuleAsync_NoEnabledActions_ReturnsWarning()
{
// Arrange
var rule = NotifyRule.Create(
ruleId: "no-actions-rule",
tenantId: "tenant1",
name: "No Actions Rule",
match: NotifyRuleMatch.Create(eventKinds: new[] { "test" }),
actions: new[] { NotifyRuleAction.Create("action-1", "channel-1", enabled: false) },
enabled: true);
// Act
var result = await _engine.ValidateRuleAsync(rule);
// Assert
Assert.True(result.IsValid);
Assert.Contains(result.Warnings, w => w.Code == "no_enabled_actions");
}
private NotifyRule CreateTestRule(string ruleId, bool enabled = true)
{
return NotifyRule.Create(
ruleId: ruleId,
tenantId: "tenant1",
name: $"Test Rule {ruleId}",
match: NotifyRuleMatch.Create(eventKinds: new[] { "event.test" }),
actions: new[]
{
NotifyRuleAction.Create(
actionId: "action-1",
channel: "channel-1",
template: "default",
enabled: true)
},
enabled: enabled);
}
private NotifyEvent CreateTestEvent(string kind)
{
return NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: kind,
tenant: "tenant1",
ts: _timeProvider.GetUtcNow(),
payload: null);
}
private NotifyChannel CreateTestChannel(string channelId)
{
return NotifyChannel.Create(
channelId: channelId,
tenantId: "tenant1",
name: $"Test Channel {channelId}",
type: NotifyChannelType.Custom,
enabled: true);
}
}

View File

@@ -0,0 +1,326 @@
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.StormBreaker;
namespace StellaOps.Notifier.Tests.StormBreaker;
public class InMemoryStormBreakerTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly StormBreakerOptions _options;
private readonly InMemoryStormBreaker _stormBreaker;
public InMemoryStormBreakerTests()
{
_timeProvider = new FakeTimeProvider(DateTimeOffset.UtcNow);
_options = new StormBreakerOptions
{
Enabled = true,
DefaultThreshold = 10,
DefaultWindow = TimeSpan.FromMinutes(5),
SummaryInterval = TimeSpan.FromMinutes(15),
StormCooldown = TimeSpan.FromMinutes(10),
MaxEventsTracked = 100,
MaxSampleEvents = 5
};
_stormBreaker = new InMemoryStormBreaker(
Options.Create(_options),
_timeProvider,
NullLogger<InMemoryStormBreaker>.Instance);
}
[Fact]
public async Task EvaluateAsync_BelowThreshold_ReturnsNoStorm()
{
// Act
var result = await _stormBreaker.EvaluateAsync("tenant1", "key1", "event1");
// Assert
Assert.False(result.IsStorm);
Assert.Equal(StormAction.SendNormally, result.Action);
Assert.Equal(1, result.EventCount);
}
[Fact]
public async Task EvaluateAsync_AtThreshold_DetectsStorm()
{
// Arrange - add events up to threshold
for (int i = 0; i < 9; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Act - 10th event triggers storm
var result = await _stormBreaker.EvaluateAsync("tenant1", "key1", "event9");
// Assert
Assert.True(result.IsStorm);
Assert.Equal(StormAction.SendStormAlert, result.Action);
Assert.Equal(10, result.EventCount);
Assert.NotNull(result.StormStartedAt);
}
[Fact]
public async Task EvaluateAsync_AfterStormDetected_SuppressesEvents()
{
// Arrange - trigger storm
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Act - next event after storm detected
var result = await _stormBreaker.EvaluateAsync("tenant1", "key1", "event10");
// Assert
Assert.True(result.IsStorm);
Assert.Equal(StormAction.Suppress, result.Action);
Assert.Equal(1, result.SuppressedCount);
}
[Fact]
public async Task EvaluateAsync_AtSummaryInterval_TriggersSummary()
{
// Arrange - trigger storm
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Advance time past summary interval
_timeProvider.Advance(TimeSpan.FromMinutes(16));
// Act
var result = await _stormBreaker.EvaluateAsync("tenant1", "key1", "event_after_interval");
// Assert
Assert.True(result.IsStorm);
Assert.Equal(StormAction.SendSummary, result.Action);
}
[Fact]
public async Task EvaluateAsync_DisabledStormBreaker_ReturnsNoStorm()
{
// Arrange
var disabledOptions = new StormBreakerOptions { Enabled = false };
var disabledBreaker = new InMemoryStormBreaker(
Options.Create(disabledOptions),
_timeProvider,
NullLogger<InMemoryStormBreaker>.Instance);
for (int i = 0; i < 20; i++)
{
var result = await disabledBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
// All events should return no storm when disabled
Assert.False(result.IsStorm);
Assert.Equal(StormAction.SendNormally, result.Action);
}
}
[Fact]
public async Task EvaluateAsync_DifferentKeys_TrackedSeparately()
{
// Arrange - trigger storm for key1
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Act
var result1 = await _stormBreaker.EvaluateAsync("tenant1", "key1", "eventA");
var result2 = await _stormBreaker.EvaluateAsync("tenant1", "key2", "eventB");
// Assert
Assert.True(result1.IsStorm);
Assert.False(result2.IsStorm);
}
[Fact]
public async Task EvaluateAsync_DifferentTenants_TrackedSeparately()
{
// Arrange - trigger storm for tenant1
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Act
var result1 = await _stormBreaker.EvaluateAsync("tenant1", "key1", "eventA");
var result2 = await _stormBreaker.EvaluateAsync("tenant2", "key1", "eventB");
// Assert
Assert.True(result1.IsStorm);
Assert.False(result2.IsStorm);
}
[Fact]
public async Task GetStateAsync_ExistingStorm_ReturnsState()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Act
var state = await _stormBreaker.GetStateAsync("tenant1", "key1");
// Assert
Assert.NotNull(state);
Assert.Equal("tenant1", state.TenantId);
Assert.Equal("key1", state.StormKey);
Assert.True(state.IsActive);
Assert.Equal(10, state.EventIds.Count);
}
[Fact]
public async Task GetStateAsync_NoStorm_ReturnsNull()
{
// Act
var state = await _stormBreaker.GetStateAsync("tenant1", "nonexistent");
// Assert
Assert.Null(state);
}
[Fact]
public async Task GetActiveStormsAsync_ReturnsActiveStormsOnly()
{
// Arrange - create two storms
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event1_{i}");
await _stormBreaker.EvaluateAsync("tenant1", "key2", $"event2_{i}");
}
// Create a non-storm (below threshold)
await _stormBreaker.EvaluateAsync("tenant1", "key3", "event3_0");
// Act
var activeStorms = await _stormBreaker.GetActiveStormsAsync("tenant1");
// Assert
Assert.Equal(2, activeStorms.Count);
Assert.Contains(activeStorms, s => s.StormKey == "key1");
Assert.Contains(activeStorms, s => s.StormKey == "key2");
}
[Fact]
public async Task ClearAsync_RemovesStormState()
{
// Arrange
for (int i = 0; i < 10; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
var beforeClear = await _stormBreaker.GetStateAsync("tenant1", "key1");
Assert.NotNull(beforeClear);
// Act
await _stormBreaker.ClearAsync("tenant1", "key1");
// Assert
var afterClear = await _stormBreaker.GetStateAsync("tenant1", "key1");
Assert.Null(afterClear);
}
[Fact]
public async Task GenerateSummaryAsync_ActiveStorm_ReturnsSummary()
{
// Arrange
for (int i = 0; i < 15; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Advance some time
_timeProvider.Advance(TimeSpan.FromMinutes(5));
// Act
var summary = await _stormBreaker.GenerateSummaryAsync("tenant1", "key1");
// Assert
Assert.NotNull(summary);
Assert.Equal("tenant1", summary.TenantId);
Assert.Equal("key1", summary.StormKey);
Assert.Equal(15, summary.TotalEvents);
Assert.True(summary.IsOngoing);
Assert.NotNull(summary.SummaryText);
Assert.NotNull(summary.SummaryTitle);
Assert.True(summary.SampleEventIds.Count <= _options.MaxSampleEvents);
}
[Fact]
public async Task GenerateSummaryAsync_NoStorm_ReturnsNull()
{
// Act
var summary = await _stormBreaker.GenerateSummaryAsync("tenant1", "nonexistent");
// Assert
Assert.Null(summary);
}
[Fact]
public async Task EvaluateAsync_EventsOutsideWindow_AreRemoved()
{
// Arrange - add events
for (int i = 0; i < 5; i++)
{
await _stormBreaker.EvaluateAsync("tenant1", "key1", $"event{i}");
}
// Move time past the window
_timeProvider.Advance(TimeSpan.FromMinutes(6));
// Act
var result = await _stormBreaker.EvaluateAsync("tenant1", "key1", "new_event");
// Assert
Assert.False(result.IsStorm);
Assert.Equal(1, result.EventCount); // Only the new event counts
}
[Fact]
public async Task EvaluateAsync_ThresholdOverrides_AppliesCorrectThreshold()
{
// Arrange
var optionsWithOverride = new StormBreakerOptions
{
Enabled = true,
DefaultThreshold = 100,
DefaultWindow = TimeSpan.FromMinutes(5),
ThresholdOverrides = new Dictionary<string, int>
{
["critical.*"] = 5
}
};
var breaker = new InMemoryStormBreaker(
Options.Create(optionsWithOverride),
_timeProvider,
NullLogger<InMemoryStormBreaker>.Instance);
// Act - 5 events should trigger storm for critical.* keys
for (int i = 0; i < 5; i++)
{
await breaker.EvaluateAsync("tenant1", "critical.alert", $"event{i}");
}
var criticalResult = await breaker.EvaluateAsync("tenant1", "critical.alert", "event5");
// Non-critical key should not be in storm yet
for (int i = 0; i < 5; i++)
{
await breaker.EvaluateAsync("tenant1", "info.log", $"event{i}");
}
var infoResult = await breaker.EvaluateAsync("tenant1", "info.log", "event5");
// Assert
Assert.True(criticalResult.IsStorm);
Assert.False(infoResult.IsStorm);
}
}

View File

@@ -0,0 +1,345 @@
using System.Collections.Immutable;
using System.Text.Json.Nodes;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Templates;
using Xunit;
namespace StellaOps.Notifier.Tests.Templates;
public sealed class EnhancedTemplateRendererTests
{
private readonly MockTemplateService _templateService;
private readonly EnhancedTemplateRenderer _renderer;
public EnhancedTemplateRendererTests()
{
_templateService = new MockTemplateService();
var options = Options.Create(new TemplateRendererOptions
{
ProvenanceBaseUrl = "https://stellaops.local/notify"
});
_renderer = new EnhancedTemplateRenderer(
_templateService,
options,
NullLogger<EnhancedTemplateRenderer>.Instance);
}
[Fact]
public async Task RenderAsync_SimpleVariables_SubstitutesCorrectly()
{
// Arrange
var template = CreateTemplate("Hello {{actor}}, event {{kind}} occurred.");
var notifyEvent = CreateEvent("pack.approval", "test-user");
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Contains("Hello test-user", result.Body);
Assert.Contains("pack.approval", result.Body);
}
[Fact]
public async Task RenderAsync_NestedPayloadVariables_SubstitutesCorrectly()
{
// Arrange
var template = CreateTemplate("Pack {{pack.id}} version {{pack.version}}");
var payload = new JsonObject
{
["pack"] = new JsonObject
{
["id"] = "pkg-001",
["version"] = "1.2.3"
}
};
var notifyEvent = CreateEvent("pack.approval", "user", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Equal("Pack pkg-001 version 1.2.3", result.Body);
}
[Fact]
public async Task RenderAsync_EachBlock_IteratesArray()
{
// Arrange
var template = CreateTemplate("Items: {{#each items}}[{{this}}]{{/each}}");
var payload = new JsonObject
{
["items"] = new JsonArray { "a", "b", "c" }
};
var notifyEvent = CreateEvent("test.event", "user", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Equal("Items: [a][b][c]", result.Body);
}
[Fact]
public async Task RenderAsync_EachBlockWithProperties_AccessesItemProperties()
{
// Arrange
var template = CreateTemplate("{{#each vulnerabilities}}{{@id}}: {{@severity}} {{/each}}");
var payload = new JsonObject
{
["vulnerabilities"] = new JsonArray
{
new JsonObject { ["id"] = "CVE-001", ["severity"] = "high" },
new JsonObject { ["id"] = "CVE-002", ["severity"] = "low" }
}
};
var notifyEvent = CreateEvent("scan.complete", "scanner", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Equal("CVE-001: high CVE-002: low ", result.Body);
}
[Fact]
public async Task RenderAsync_RedactsSensitiveFields()
{
// Arrange
_templateService.RedactionConfig = new TemplateRedactionConfig
{
AllowedFields = [],
DeniedFields = ["secret", "token"],
Mode = "safe"
};
var template = CreateTemplate("Secret: {{secretKey}}, Token: {{authToken}}, Name: {{name}}");
var payload = new JsonObject
{
["secretKey"] = "super-secret-123",
["authToken"] = "tok-456",
["name"] = "public-name"
};
var notifyEvent = CreateEvent("test.event", "user", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Contains("[REDACTED]", result.Body);
Assert.Contains("public-name", result.Body);
Assert.DoesNotContain("super-secret-123", result.Body);
Assert.DoesNotContain("tok-456", result.Body);
}
[Fact]
public async Task RenderAsync_ParanoidMode_OnlyAllowsExplicitFields()
{
// Arrange
_templateService.RedactionConfig = new TemplateRedactionConfig
{
AllowedFields = ["name"],
DeniedFields = [],
Mode = "paranoid"
};
var template = CreateTemplate("Name: {{name}}, Email: {{email}}");
var payload = new JsonObject
{
["name"] = "John",
["email"] = "john@example.com"
};
var notifyEvent = CreateEvent("test.event", "user", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Contains("Name: John", result.Body);
Assert.Contains("Email: [REDACTED]", result.Body);
}
[Fact]
public async Task RenderAsync_AddsProvenanceLinks()
{
// Arrange
var template = CreateTemplate("Event link: {{provenance.eventUrl}}");
var notifyEvent = CreateEvent("test.event", "user");
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Contains("https://stellaops.local/notify/events/", result.Body);
}
[Fact]
public async Task RenderAsync_FormatSpecifiers_Work()
{
// Arrange
var template = CreateTemplate("Upper: {{name|upper}}, Lower: {{name|lower}}");
var payload = new JsonObject { ["name"] = "Test" };
var notifyEvent = CreateEvent("test.event", "user", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Contains("Upper: TEST", result.Body);
Assert.Contains("Lower: test", result.Body);
}
[Fact]
public async Task RenderAsync_HtmlFormat_EncodesSpecialChars()
{
// Arrange
var template = CreateTemplate("Script: {{code|html}}", NotifyDeliveryFormat.Html);
var payload = new JsonObject { ["code"] = "<script>alert('xss')</script>" };
var notifyEvent = CreateEvent("test.event", "user", payload);
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.DoesNotContain("<script>", result.Body);
Assert.Contains("&lt;script&gt;", result.Body);
}
[Fact]
public async Task RenderAsync_RendersSubject()
{
// Arrange
var metadata = new Dictionary<string, string>
{
["subject"] = "[Alert] {{kind}} from {{actor}}"
};
var template = CreateTemplate("Body content", NotifyDeliveryFormat.PlainText, metadata);
var notifyEvent = CreateEvent("security.alert", "scanner");
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.NotNull(result.Subject);
Assert.Contains("security.alert", result.Subject);
Assert.Contains("scanner", result.Subject);
}
[Fact]
public async Task RenderAsync_ComputesBodyHash()
{
// Arrange
var template = CreateTemplate("Static content");
var notifyEvent = CreateEvent("test.event", "user");
// Act
var result1 = await _renderer.RenderAsync(template, notifyEvent);
var result2 = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.NotNull(result1.BodyHash);
Assert.Equal(64, result1.BodyHash.Length); // SHA-256 hex
Assert.Equal(result1.BodyHash, result2.BodyHash);
}
[Fact]
public async Task RenderAsync_MarkdownToHtml_Converts()
{
// Arrange
var template = NotifyTemplate.Create(
templateId: "tmpl-md",
tenantId: "test-tenant",
channelType: NotifyChannelType.Email,
key: "test.key",
locale: "en-us",
body: "# Header\n**Bold** text",
renderMode: NotifyTemplateRenderMode.Markdown,
format: NotifyDeliveryFormat.Html);
var notifyEvent = CreateEvent("test.event", "user");
// Act
var result = await _renderer.RenderAsync(template, notifyEvent);
// Assert
Assert.Contains("<h1>", result.Body);
Assert.Contains("<strong>", result.Body);
}
[Fact]
public async Task RenderAsync_IfBlock_RendersConditionally()
{
// Arrange
var template = CreateTemplate("{{#if critical}}CRITICAL: {{/if}}Message");
var payloadTrue = new JsonObject { ["critical"] = "true" };
var payloadFalse = new JsonObject { ["critical"] = "" };
var eventTrue = CreateEvent("test", "user", payloadTrue);
var eventFalse = CreateEvent("test", "user", payloadFalse);
// Act
var resultTrue = await _renderer.RenderAsync(template, eventTrue);
var resultFalse = await _renderer.RenderAsync(template, eventFalse);
// Assert
Assert.Contains("CRITICAL:", resultTrue.Body);
Assert.DoesNotContain("CRITICAL:", resultFalse.Body);
}
private static NotifyTemplate CreateTemplate(
string body,
NotifyDeliveryFormat format = NotifyDeliveryFormat.PlainText,
Dictionary<string, string>? metadata = null)
{
return NotifyTemplate.Create(
templateId: "test-template",
tenantId: "test-tenant",
channelType: NotifyChannelType.Webhook,
key: "test.key",
locale: "en-us",
body: body,
format: format,
metadata: metadata);
}
private static NotifyEvent CreateEvent(
string kind,
string actor,
JsonObject? payload = null)
{
return NotifyEvent.Create(
eventId: Guid.NewGuid().ToString(),
tenant: "test-tenant",
kind: kind,
actor: actor,
timestamp: DateTimeOffset.UtcNow,
payload: payload ?? new JsonObject());
}
private sealed class MockTemplateService : INotifyTemplateService
{
public TemplateRedactionConfig RedactionConfig { get; set; } = TemplateRedactionConfig.Default;
public Task<NotifyTemplate?> ResolveAsync(string tenantId, string key, NotifyChannelType channelType, string locale, CancellationToken cancellationToken = default)
=> Task.FromResult<NotifyTemplate?>(null);
public Task<NotifyTemplate?> GetByIdAsync(string tenantId, string templateId, CancellationToken cancellationToken = default)
=> Task.FromResult<NotifyTemplate?>(null);
public Task<TemplateUpsertResult> UpsertAsync(NotifyTemplate template, string actor, CancellationToken cancellationToken = default)
=> Task.FromResult(TemplateUpsertResult.Created(template.TemplateId));
public Task<bool> DeleteAsync(string tenantId, string templateId, string actor, CancellationToken cancellationToken = default)
=> Task.FromResult(true);
public Task<IReadOnlyList<NotifyTemplate>> ListAsync(string tenantId, TemplateListOptions? options = null, CancellationToken cancellationToken = default)
=> Task.FromResult<IReadOnlyList<NotifyTemplate>>([]);
public TemplateValidationResult Validate(string templateBody)
=> TemplateValidationResult.Valid();
public TemplateRedactionConfig GetRedactionConfig(NotifyTemplate template)
=> RedactionConfig;
}
}

View File

@@ -0,0 +1,340 @@
using Microsoft.Extensions.Logging.Abstractions;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Templates;
using Xunit;
namespace StellaOps.Notifier.Tests.Templates;
public sealed class NotifyTemplateServiceTests
{
private readonly InMemoryTemplateRepository _templateRepository;
private readonly InMemoryAuditRepository _auditRepository;
private readonly NotifyTemplateService _service;
public NotifyTemplateServiceTests()
{
_templateRepository = new InMemoryTemplateRepository();
_auditRepository = new InMemoryAuditRepository();
_service = new NotifyTemplateService(
_templateRepository,
_auditRepository,
NullLogger<NotifyTemplateService>.Instance);
}
[Fact]
public async Task ResolveAsync_ExactLocaleMatch_ReturnsTemplate()
{
// Arrange
var template = CreateTemplate("tmpl-001", "pack.approval", "en-us");
await _templateRepository.UpsertAsync(template);
// Act
var result = await _service.ResolveAsync(
"test-tenant", "pack.approval", NotifyChannelType.Webhook, "en-US");
// Assert
Assert.NotNull(result);
Assert.Equal("tmpl-001", result.TemplateId);
}
[Fact]
public async Task ResolveAsync_FallbackToLanguageOnly_ReturnsTemplate()
{
// Arrange
var template = CreateTemplate("tmpl-en", "pack.approval", "en");
await _templateRepository.UpsertAsync(template);
// Act - request en-GB but only en exists
var result = await _service.ResolveAsync(
"test-tenant", "pack.approval", NotifyChannelType.Webhook, "en-GB");
// Assert
Assert.NotNull(result);
Assert.Equal("tmpl-en", result.TemplateId);
}
[Fact]
public async Task ResolveAsync_FallbackToDefault_ReturnsTemplate()
{
// Arrange
var template = CreateTemplate("tmpl-default", "pack.approval", "en-us");
await _templateRepository.UpsertAsync(template);
// Act - request de-DE but only en-us exists (default)
var result = await _service.ResolveAsync(
"test-tenant", "pack.approval", NotifyChannelType.Webhook, "de-DE");
// Assert
Assert.NotNull(result);
Assert.Equal("tmpl-default", result.TemplateId);
}
[Fact]
public async Task ResolveAsync_NoMatch_ReturnsNull()
{
// Act
var result = await _service.ResolveAsync(
"test-tenant", "nonexistent.key", NotifyChannelType.Webhook, "en-US");
// Assert
Assert.Null(result);
}
[Fact]
public async Task UpsertAsync_NewTemplate_CreatesAndAudits()
{
// Arrange
var template = CreateTemplate("tmpl-new", "pack.approval", "en-us");
// Act
var result = await _service.UpsertAsync(template, "test-actor");
// Assert
Assert.True(result.Success);
Assert.True(result.IsNew);
Assert.Equal("tmpl-new", result.TemplateId);
var audit = _auditRepository.Entries.Single();
Assert.Equal("template.created", audit.EventType);
Assert.Equal("test-actor", audit.Actor);
}
[Fact]
public async Task UpsertAsync_ExistingTemplate_UpdatesAndAudits()
{
// Arrange
var original = CreateTemplate("tmpl-existing", "pack.approval", "en-us", "Original body");
await _templateRepository.UpsertAsync(original);
_auditRepository.Entries.Clear();
var updated = CreateTemplate("tmpl-existing", "pack.approval", "en-us", "Updated body");
// Act
var result = await _service.UpsertAsync(updated, "another-actor");
// Assert
Assert.True(result.Success);
Assert.False(result.IsNew);
var audit = _auditRepository.Entries.Single();
Assert.Equal("template.updated", audit.EventType);
Assert.Equal("another-actor", audit.Actor);
}
[Fact]
public async Task UpsertAsync_InvalidTemplate_ReturnsError()
{
// Arrange - template with mismatched braces
var template = NotifyTemplate.Create(
templateId: "tmpl-invalid",
tenantId: "test-tenant",
channelType: NotifyChannelType.Webhook,
key: "test.key",
locale: "en-us",
body: "Hello {{name} - missing closing brace");
// Act
var result = await _service.UpsertAsync(template, "test-actor");
// Assert
Assert.False(result.Success);
Assert.Contains("braces", result.Error!, StringComparison.OrdinalIgnoreCase);
}
[Fact]
public async Task DeleteAsync_ExistingTemplate_DeletesAndAudits()
{
// Arrange
var template = CreateTemplate("tmpl-delete", "pack.approval", "en-us");
await _templateRepository.UpsertAsync(template);
// Act
var deleted = await _service.DeleteAsync("test-tenant", "tmpl-delete", "delete-actor");
// Assert
Assert.True(deleted);
Assert.Null(await _templateRepository.GetAsync("test-tenant", "tmpl-delete"));
var audit = _auditRepository.Entries.Last();
Assert.Equal("template.deleted", audit.EventType);
}
[Fact]
public async Task DeleteAsync_NonexistentTemplate_ReturnsFalse()
{
// Act
var deleted = await _service.DeleteAsync("test-tenant", "nonexistent", "actor");
// Assert
Assert.False(deleted);
}
[Fact]
public async Task ListAsync_WithFilters_ReturnsFilteredResults()
{
// Arrange
await _templateRepository.UpsertAsync(CreateTemplate("tmpl-1", "pack.approval", "en-us"));
await _templateRepository.UpsertAsync(CreateTemplate("tmpl-2", "pack.approval", "de-de"));
await _templateRepository.UpsertAsync(CreateTemplate("tmpl-3", "risk.alert", "en-us"));
// Act
var results = await _service.ListAsync("test-tenant", new TemplateListOptions
{
KeyPrefix = "pack.",
Locale = "en-us"
});
// Assert
Assert.Single(results);
Assert.Equal("tmpl-1", results[0].TemplateId);
}
[Fact]
public void Validate_ValidTemplate_ReturnsValid()
{
// Act
var result = _service.Validate("Hello {{name}}, your order {{orderId}} is ready.");
// Assert
Assert.True(result.IsValid);
Assert.Empty(result.Errors);
}
[Fact]
public void Validate_MismatchedBraces_ReturnsInvalid()
{
// Act
var result = _service.Validate("Hello {{name}, missing close");
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Contains("braces"));
}
[Fact]
public void Validate_UnclosedEachBlock_ReturnsInvalid()
{
// Act
var result = _service.Validate("{{#each items}}{{this}}");
// Assert
Assert.False(result.IsValid);
Assert.Contains(result.Errors, e => e.Contains("#each"));
}
[Fact]
public void Validate_SensitiveVariable_ReturnsWarning()
{
// Act
var result = _service.Validate("Your API key is: {{apiKey}}");
// Assert
Assert.True(result.IsValid);
Assert.Contains(result.Warnings, w => w.Contains("sensitive"));
}
[Fact]
public void GetRedactionConfig_DefaultMode_ReturnsSafeDefaults()
{
// Arrange
var template = CreateTemplate("tmpl-001", "test.key", "en-us");
// Act
var config = _service.GetRedactionConfig(template);
// Assert
Assert.Equal("safe", config.Mode);
Assert.Contains("secret", config.DeniedFields);
Assert.Contains("password", config.DeniedFields);
}
[Fact]
public void GetRedactionConfig_ParanoidMode_RequiresAllowlist()
{
// Arrange
var template = NotifyTemplate.Create(
templateId: "tmpl-paranoid",
tenantId: "test-tenant",
channelType: NotifyChannelType.Webhook,
key: "test.key",
locale: "en-us",
body: "Test body",
metadata: new Dictionary<string, string>
{
["redaction"] = "paranoid",
["redaction.allow"] = "name,email"
});
// Act
var config = _service.GetRedactionConfig(template);
// Assert
Assert.Equal("paranoid", config.Mode);
Assert.Contains("name", config.AllowedFields);
Assert.Contains("email", config.AllowedFields);
}
private static NotifyTemplate CreateTemplate(
string templateId,
string key,
string locale,
string body = "Test body {{variable}}")
{
return NotifyTemplate.Create(
templateId: templateId,
tenantId: "test-tenant",
channelType: NotifyChannelType.Webhook,
key: key,
locale: locale,
body: body);
}
private sealed class InMemoryTemplateRepository : StellaOps.Notify.Storage.Mongo.Repositories.INotifyTemplateRepository
{
private readonly Dictionary<string, NotifyTemplate> _templates = new();
public Task UpsertAsync(NotifyTemplate template, CancellationToken cancellationToken = default)
{
var key = $"{template.TenantId}:{template.TemplateId}";
_templates[key] = template;
return Task.CompletedTask;
}
public Task<NotifyTemplate?> GetAsync(string tenantId, string templateId, CancellationToken cancellationToken = default)
{
var key = $"{tenantId}:{templateId}";
return Task.FromResult(_templates.GetValueOrDefault(key));
}
public Task<IReadOnlyList<NotifyTemplate>> ListAsync(string tenantId, CancellationToken cancellationToken = default)
{
var result = _templates.Values
.Where(t => t.TenantId == tenantId)
.ToList();
return Task.FromResult<IReadOnlyList<NotifyTemplate>>(result);
}
public Task DeleteAsync(string tenantId, string templateId, CancellationToken cancellationToken = default)
{
var key = $"{tenantId}:{templateId}";
_templates.Remove(key);
return Task.CompletedTask;
}
}
private sealed class InMemoryAuditRepository : StellaOps.Notify.Storage.Mongo.Repositories.INotifyAuditRepository
{
public List<(string TenantId, string EventType, string Actor, IReadOnlyDictionary<string, string> Metadata)> Entries { get; } = [];
public Task AppendAsync(
string tenantId,
string eventType,
string actor,
IReadOnlyDictionary<string, string> metadata,
CancellationToken cancellationToken)
{
Entries.Add((tenantId, eventType, actor, metadata));
return Task.CompletedTask;
}
}
}

View File

@@ -0,0 +1,436 @@
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notifier.Worker.Tenancy;
using Xunit;
namespace StellaOps.Notifier.Tests.Tenancy;
public sealed class TenantChannelResolverTests
{
private static DefaultTenantChannelResolver CreateResolver(
ITenantContextAccessor? accessor = null,
TenantChannelResolverOptions? options = null)
{
accessor ??= new TenantContextAccessor();
options ??= new TenantChannelResolverOptions();
return new DefaultTenantChannelResolver(
accessor,
Options.Create(options),
NullLogger<DefaultTenantChannelResolver>.Instance);
}
[Fact]
public void Resolve_SimpleReference_UsesCurrentTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var resolver = CreateResolver(accessor);
// Act
var result = resolver.Resolve("slack-alerts");
// Assert
result.Success.Should().BeTrue();
result.TenantId.Should().Be("tenant-a");
result.ChannelId.Should().Be("slack-alerts");
result.ScopedId.Should().Be("tenant-a:slack-alerts");
result.IsCrossTenant.Should().BeFalse();
result.IsGlobalChannel.Should().BeFalse();
}
[Fact]
public void Resolve_QualifiedReference_UsesSpecifiedTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantChannelResolverOptions { AllowCrossTenant = true };
var resolver = CreateResolver(accessor, options);
// Act
var result = resolver.Resolve("tenant-b:email-channel");
// Assert
result.Success.Should().BeTrue();
result.TenantId.Should().Be("tenant-b");
result.ChannelId.Should().Be("email-channel");
result.ScopedId.Should().Be("tenant-b:email-channel");
result.IsCrossTenant.Should().BeTrue();
}
[Fact]
public void Resolve_CrossTenantReference_DeniedByDefault()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantChannelResolverOptions { AllowCrossTenant = false };
var resolver = CreateResolver(accessor, options);
// Act
var result = resolver.Resolve("tenant-b:email-channel");
// Assert
result.Success.Should().BeFalse();
result.Error.Should().Contain("Cross-tenant");
}
[Fact]
public void Resolve_SameTenantQualified_NotCrossTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var resolver = CreateResolver(accessor);
// Act
var result = resolver.Resolve("tenant-a:slack-channel");
// Assert
result.Success.Should().BeTrue();
result.IsCrossTenant.Should().BeFalse();
}
[Fact]
public void Resolve_GlobalPrefix_ResolvesToGlobalTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantChannelResolverOptions
{
GlobalPrefix = "@global",
GlobalTenant = "system",
AllowGlobalChannels = true
};
var resolver = CreateResolver(accessor, options);
// Act
var result = resolver.Resolve("@global:broadcast");
// Assert
result.Success.Should().BeTrue();
result.TenantId.Should().Be("system");
result.ChannelId.Should().Be("broadcast");
result.IsGlobalChannel.Should().BeTrue();
result.IsCrossTenant.Should().BeTrue();
}
[Fact]
public void Resolve_GlobalChannels_DeniedWhenDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantChannelResolverOptions { AllowGlobalChannels = false };
var resolver = CreateResolver(accessor, options);
// Act
var result = resolver.Resolve("@global:broadcast");
// Assert
result.Success.Should().BeFalse();
result.Error.Should().Contain("Global channels are not allowed");
}
[Fact]
public void Resolve_GlobalChannelPattern_MatchesPatterns()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantChannelResolverOptions
{
GlobalChannelPatterns = ["system-*", "shared-*"],
GlobalTenant = "system"
};
var resolver = CreateResolver(accessor, options);
// Act
var result = resolver.Resolve("system-alerts");
// Assert
result.Success.Should().BeTrue();
result.TenantId.Should().Be("system");
result.IsGlobalChannel.Should().BeTrue();
}
[Fact]
public void Resolve_NoTenantContext_Fails()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var result = resolver.Resolve("any-channel");
// Assert
result.Success.Should().BeFalse();
result.Error.Should().Contain("No tenant context");
}
[Fact]
public void Resolve_WithExplicitTenantId_DoesNotRequireContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var result = resolver.Resolve("slack-alerts", "explicit-tenant");
// Assert
result.Success.Should().BeTrue();
result.TenantId.Should().Be("explicit-tenant");
result.ChannelId.Should().Be("slack-alerts");
}
[Fact]
public void Resolve_EmptyReference_Fails()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var resolver = CreateResolver(accessor);
// Act
var result = resolver.Resolve("");
// Assert
result.Success.Should().BeFalse();
result.Error.Should().Contain("empty");
}
[Fact]
public void CreateQualifiedReference_CreatesCorrectFormat()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var reference = resolver.CreateQualifiedReference("tenant-xyz", "channel-abc");
// Assert
reference.Should().Be("tenant-xyz:channel-abc");
}
[Fact]
public void Parse_SimpleReference_ParsesCorrectly()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var components = resolver.Parse("slack-alerts");
// Assert
components.HasTenantPrefix.Should().BeFalse();
components.TenantId.Should().BeNull();
components.ChannelId.Should().Be("slack-alerts");
components.IsGlobal.Should().BeFalse();
}
[Fact]
public void Parse_QualifiedReference_ParsesCorrectly()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var components = resolver.Parse("tenant-a:email-channel");
// Assert
components.HasTenantPrefix.Should().BeTrue();
components.TenantId.Should().Be("tenant-a");
components.ChannelId.Should().Be("email-channel");
components.IsGlobal.Should().BeFalse();
}
[Fact]
public void Parse_GlobalReference_ParsesCorrectly()
{
// Arrange
var accessor = new TenantContextAccessor();
var options = new TenantChannelResolverOptions { GlobalPrefix = "@global" };
var resolver = CreateResolver(accessor, options);
// Act
var components = resolver.Parse("@global:broadcast");
// Assert
components.IsGlobal.Should().BeTrue();
components.ChannelId.Should().Be("broadcast");
components.HasTenantPrefix.Should().BeFalse();
}
[Fact]
public void IsValidReference_ValidSimpleReference()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var isValid = resolver.IsValidReference("slack-alerts");
// Assert
isValid.Should().BeTrue();
}
[Fact]
public void IsValidReference_ValidQualifiedReference()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var isValid = resolver.IsValidReference("tenant-a:channel-1");
// Assert
isValid.Should().BeTrue();
}
[Fact]
public void IsValidReference_InvalidCharacters()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var isValid = resolver.IsValidReference("channel@invalid");
// Assert
isValid.Should().BeFalse();
}
[Fact]
public void IsValidReference_EmptyReference()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = CreateResolver(accessor);
// Act
var isValid = resolver.IsValidReference("");
// Assert
isValid.Should().BeFalse();
}
[Fact]
public void GetFallbackReferences_IncludesGlobalFallback()
{
// Arrange
var accessor = new TenantContextAccessor();
var options = new TenantChannelResolverOptions
{
FallbackToGlobal = true,
GlobalPrefix = "@global"
};
var resolver = CreateResolver(accessor, options);
// Act
var fallbacks = resolver.GetFallbackReferences("slack-alerts");
// Assert
fallbacks.Should().HaveCount(2);
fallbacks[0].Should().Be("slack-alerts");
fallbacks[1].Should().Contain("@global");
}
[Fact]
public void GetFallbackReferences_NoGlobalFallbackWhenDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
var options = new TenantChannelResolverOptions { FallbackToGlobal = false };
var resolver = CreateResolver(accessor, options);
// Act
var fallbacks = resolver.GetFallbackReferences("slack-alerts");
// Assert
fallbacks.Should().HaveCount(1);
fallbacks[0].Should().Be("slack-alerts");
}
}
public sealed class TenantChannelResolutionTests
{
[Fact]
public void Successful_CreatesSuccessResult()
{
// Arrange & Act
var result = TenantChannelResolution.Successful(
"tenant-a", "channel-1", "channel-1");
// Assert
result.Success.Should().BeTrue();
result.TenantId.Should().Be("tenant-a");
result.ChannelId.Should().Be("channel-1");
result.ScopedId.Should().Be("tenant-a:channel-1");
result.Error.Should().BeNull();
}
[Fact]
public void Failed_CreatesFailedResult()
{
// Arrange & Act
var result = TenantChannelResolution.Failed("bad-ref", "Invalid channel reference");
// Assert
result.Success.Should().BeFalse();
result.TenantId.Should().BeEmpty();
result.ChannelId.Should().BeEmpty();
result.Error.Should().Be("Invalid channel reference");
}
}
public sealed class TenantChannelResolverExtensionsTests
{
[Fact]
public void ResolveRequired_ThrowsOnFailure()
{
// Arrange
var accessor = new TenantContextAccessor();
var resolver = new DefaultTenantChannelResolver(
accessor,
Options.Create(new TenantChannelResolverOptions()),
NullLogger<DefaultTenantChannelResolver>.Instance);
// Act
var act = () => resolver.ResolveRequired("any-channel");
// Assert
act.Should().Throw<InvalidOperationException>()
.WithMessage("*Failed to resolve*");
}
[Fact]
public void ResolveRequired_ReturnsResultOnSuccess()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var resolver = new DefaultTenantChannelResolver(
accessor,
Options.Create(new TenantChannelResolverOptions()),
NullLogger<DefaultTenantChannelResolver>.Instance);
// Act
var result = resolver.ResolveRequired("slack-alerts");
// Assert
result.Success.Should().BeTrue();
result.ChannelId.Should().Be("slack-alerts");
}
}

View File

@@ -0,0 +1,242 @@
using FluentAssertions;
using StellaOps.Notifier.Worker.Tenancy;
using Xunit;
namespace StellaOps.Notifier.Tests.Tenancy;
public sealed class TenantContextTests
{
[Fact]
public void FromHeaders_CreatesValidContext()
{
// Arrange & Act
var context = TenantContext.FromHeaders(
tenantId: "tenant-123",
actor: "user@test.com",
correlationId: "corr-456");
// Assert
context.TenantId.Should().Be("tenant-123");
context.Actor.Should().Be("user@test.com");
context.CorrelationId.Should().Be("corr-456");
context.Source.Should().Be(TenantContextSource.HttpHeader);
context.IsSystemContext.Should().BeFalse();
context.Claims.Should().BeEmpty();
}
[Fact]
public void FromHeaders_UsesDefaultActorWhenEmpty()
{
// Arrange & Act
var context = TenantContext.FromHeaders(
tenantId: "tenant-123",
actor: null,
correlationId: null);
// Assert
context.Actor.Should().Be("api");
context.CorrelationId.Should().BeNull();
}
[Fact]
public void FromEvent_CreatesContextFromEventSource()
{
// Arrange & Act
var context = TenantContext.FromEvent(
tenantId: "tenant-event",
actor: "scheduler",
correlationId: "event-corr");
// Assert
context.TenantId.Should().Be("tenant-event");
context.Source.Should().Be(TenantContextSource.EventContext);
context.IsSystemContext.Should().BeFalse();
}
[Fact]
public void System_CreatesSystemContext()
{
// Arrange & Act
var context = TenantContext.System("system-tenant");
// Assert
context.TenantId.Should().Be("system-tenant");
context.Actor.Should().Be("system");
context.IsSystemContext.Should().BeTrue();
context.Source.Should().Be(TenantContextSource.System);
}
[Fact]
public void WithClaim_AddsClaim()
{
// Arrange
var context = TenantContext.FromHeaders("tenant-1", "user", null);
// Act
var result = context.WithClaim("role", "admin");
// Assert
result.Claims.Should().ContainKey("role");
result.Claims["role"].Should().Be("admin");
}
[Fact]
public void WithClaims_AddsMultipleClaims()
{
// Arrange
var context = TenantContext.FromHeaders("tenant-1", "user", null);
var claims = new Dictionary<string, string>
{
["role"] = "admin",
["department"] = "engineering"
};
// Act
var result = context.WithClaims(claims);
// Assert
result.Claims.Should().HaveCount(2);
result.Claims["role"].Should().Be("admin");
result.Claims["department"].Should().Be("engineering");
}
}
public sealed class TenantContextAccessorTests
{
[Fact]
public void Context_ReturnsNullWhenNotSet()
{
// Arrange
var accessor = new TenantContextAccessor();
// Act & Assert
accessor.Context.Should().BeNull();
accessor.TenantId.Should().BeNull();
accessor.HasContext.Should().BeFalse();
}
[Fact]
public void Context_CanBeSetAndRetrieved()
{
// Arrange
var accessor = new TenantContextAccessor();
var context = TenantContext.FromHeaders("tenant-abc", "user", "corr");
// Act
accessor.Context = context;
// Assert
accessor.Context.Should().Be(context);
accessor.TenantId.Should().Be("tenant-abc");
accessor.HasContext.Should().BeTrue();
}
[Fact]
public void RequiredTenantId_ThrowsWhenNoContext()
{
// Arrange
var accessor = new TenantContextAccessor();
// Act
var act = () => accessor.RequiredTenantId;
// Assert
act.Should().Throw<InvalidOperationException>()
.WithMessage("*tenant context*");
}
[Fact]
public void RequiredTenantId_ReturnsTenantIdWhenSet()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-xyz", "user", null);
// Act
var tenantId = accessor.RequiredTenantId;
// Assert
tenantId.Should().Be("tenant-xyz");
}
[Fact]
public void Context_CanBeCleared()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-abc", "user", null);
// Act
accessor.Context = null;
// Assert
accessor.HasContext.Should().BeFalse();
accessor.TenantId.Should().BeNull();
}
}
public sealed class TenantContextScopeTests
{
[Fact]
public void Scope_SetsContextForDuration()
{
// Arrange
var accessor = new TenantContextAccessor();
var originalContext = TenantContext.FromHeaders("original-tenant", "user", null);
var scopedContext = TenantContext.FromHeaders("scoped-tenant", "scoped-user", null);
accessor.Context = originalContext;
// Act & Assert
using (var scope = new TenantContextScope(accessor, scopedContext))
{
accessor.TenantId.Should().Be("scoped-tenant");
}
// After scope, original context restored
accessor.TenantId.Should().Be("original-tenant");
}
[Fact]
public void Scope_RestoresNullContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var scopedContext = TenantContext.FromHeaders("scoped-tenant", "user", null);
// Act & Assert
using (var scope = new TenantContextScope(accessor, scopedContext))
{
accessor.TenantId.Should().Be("scoped-tenant");
}
accessor.HasContext.Should().BeFalse();
}
[Fact]
public void Create_CreatesScope()
{
// Arrange
var accessor = new TenantContextAccessor();
// Act
using var scope = TenantContextScope.Create(accessor, "temp-tenant", "temp-actor");
// Assert
accessor.TenantId.Should().Be("temp-tenant");
accessor.Context!.Actor.Should().Be("temp-actor");
}
[Fact]
public void CreateSystem_CreatesSystemScope()
{
// Arrange
var accessor = new TenantContextAccessor();
// Act
using var scope = TenantContextScope.CreateSystem(accessor, "system-tenant");
// Assert
accessor.TenantId.Should().Be("system-tenant");
accessor.Context!.IsSystemContext.Should().BeTrue();
}
}

View File

@@ -0,0 +1,462 @@
using System.Text.Json;
using FluentAssertions;
using Microsoft.AspNetCore.Http;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notifier.Worker.Tenancy;
using Xunit;
namespace StellaOps.Notifier.Tests.Tenancy;
public sealed class TenantMiddlewareTests
{
private static (TenantMiddleware Middleware, TenantContextAccessor Accessor) CreateMiddleware(
RequestDelegate? next = null,
TenantMiddlewareOptions? options = null)
{
var accessor = new TenantContextAccessor();
options ??= new TenantMiddlewareOptions();
next ??= _ => Task.CompletedTask;
var middleware = new TenantMiddleware(
next,
accessor,
Options.Create(options),
NullLogger<TenantMiddleware>.Instance);
return (middleware, accessor);
}
private static HttpContext CreateHttpContext(
string path = "/api/v1/test",
Dictionary<string, string>? headers = null,
Dictionary<string, string>? query = null)
{
var context = new DefaultHttpContext();
context.Request.Path = path;
if (headers != null)
{
foreach (var (key, value) in headers)
{
context.Request.Headers[key] = value;
}
}
if (query != null)
{
var queryString = string.Join("&", query.Select(kvp => $"{kvp.Key}={kvp.Value}"));
context.Request.QueryString = new QueryString($"?{queryString}");
}
context.Response.Body = new MemoryStream();
return context;
}
[Fact]
public async Task InvokeAsync_ExtractsTenantFromHeader()
{
// Arrange
var nextCalled = false;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
nextCalled = true;
return Task.CompletedTask;
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant-123",
["X-StellaOps-Actor"] = "user@test.com",
["X-Correlation-Id"] = "corr-456"
});
// Act
await middleware.InvokeAsync(context);
// Assert
nextCalled.Should().BeTrue();
// Note: Context is cleared after middleware completes
}
[Fact]
public async Task InvokeAsync_ReturnsBadRequest_WhenTenantMissingAndRequired()
{
// Arrange
var (middleware, _) = CreateMiddleware(options: new TenantMiddlewareOptions
{
RequireTenant = true
});
var context = CreateHttpContext();
// Act
await middleware.InvokeAsync(context);
// Assert
context.Response.StatusCode.Should().Be(StatusCodes.Status400BadRequest);
context.Response.Body.Seek(0, SeekOrigin.Begin);
var body = await new StreamReader(context.Response.Body).ReadToEndAsync();
body.Should().Contain("tenant_missing");
}
[Fact]
public async Task InvokeAsync_ContinuesWithoutTenant_WhenNotRequired()
{
// Arrange
var nextCalled = false;
var (middleware, _) = CreateMiddleware(
next: _ => { nextCalled = true; return Task.CompletedTask; },
options: new TenantMiddlewareOptions { RequireTenant = false });
var context = CreateHttpContext();
// Act
await middleware.InvokeAsync(context);
// Assert
nextCalled.Should().BeTrue();
context.Response.StatusCode.Should().Be(StatusCodes.Status200OK);
}
[Fact]
public async Task InvokeAsync_SkipsExcludedPaths()
{
// Arrange
var nextCalled = false;
var (middleware, accessor) = CreateMiddleware(
next: _ => { nextCalled = true; return Task.CompletedTask; },
options: new TenantMiddlewareOptions
{
RequireTenant = true,
ExcludedPaths = ["/healthz", "/metrics"]
});
var context = CreateHttpContext(path: "/healthz");
// Act
await middleware.InvokeAsync(context);
// Assert
nextCalled.Should().BeTrue();
context.Response.StatusCode.Should().Be(StatusCodes.Status200OK);
}
[Fact]
public async Task InvokeAsync_ReturnsBadRequest_ForInvalidTenantId()
{
// Arrange
var (middleware, _) = CreateMiddleware();
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant@invalid#chars!"
});
// Act
await middleware.InvokeAsync(context);
// Assert
context.Response.StatusCode.Should().Be(StatusCodes.Status400BadRequest);
context.Response.Body.Seek(0, SeekOrigin.Begin);
var body = await new StreamReader(context.Response.Body).ReadToEndAsync();
body.Should().Contain("tenant_invalid");
}
[Fact]
public async Task InvokeAsync_RejectsTenantId_TooShort()
{
// Arrange
var (middleware, _) = CreateMiddleware(options: new TenantMiddlewareOptions
{
MinTenantIdLength = 5
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "abc"
});
// Act
await middleware.InvokeAsync(context);
// Assert
context.Response.StatusCode.Should().Be(StatusCodes.Status400BadRequest);
}
[Fact]
public async Task InvokeAsync_RejectsTenantId_TooLong()
{
// Arrange
var (middleware, _) = CreateMiddleware(options: new TenantMiddlewareOptions
{
MaxTenantIdLength = 10
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "very-long-tenant-id-exceeding-max"
});
// Act
await middleware.InvokeAsync(context);
// Assert
context.Response.StatusCode.Should().Be(StatusCodes.Status400BadRequest);
}
[Fact]
public async Task InvokeAsync_ExtractsTenantFromQueryParam_ForWebSocket()
{
// Arrange
var nextCalled = false;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
nextCalled = true;
return Task.CompletedTask;
});
var context = CreateHttpContext(
path: "/api/v2/incidents/live",
query: new Dictionary<string, string> { ["tenant"] = "websocket-tenant" });
// Act
await middleware.InvokeAsync(context);
// Assert
nextCalled.Should().BeTrue();
}
[Fact]
public async Task InvokeAsync_PrefersHeaderOverQueryParam()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
capturedContext = accessor.Context;
return Task.CompletedTask;
});
var context = CreateHttpContext(
headers: new Dictionary<string, string> { ["X-StellaOps-Tenant"] = "header-tenant" },
query: new Dictionary<string, string> { ["tenant"] = "query-tenant" });
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.TenantId.Should().Be("header-tenant");
}
[Fact]
public async Task InvokeAsync_UsesCustomHeaderNames()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(
next: ctx => { capturedContext = accessor.Context; return Task.CompletedTask; },
options: new TenantMiddlewareOptions
{
TenantHeader = "X-Custom-Tenant",
ActorHeader = "X-Custom-Actor",
CorrelationHeader = "X-Custom-Correlation"
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-Custom-Tenant"] = "custom-tenant",
["X-Custom-Actor"] = "custom-actor",
["X-Custom-Correlation"] = "custom-corr"
});
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.TenantId.Should().Be("custom-tenant");
capturedContext.Actor.Should().Be("custom-actor");
capturedContext.CorrelationId.Should().Be("custom-corr");
}
[Fact]
public async Task InvokeAsync_SetsDefaultActor_WhenNotProvided()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
capturedContext = accessor.Context;
return Task.CompletedTask;
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant-123"
});
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.Actor.Should().Be("api");
}
[Fact]
public async Task InvokeAsync_UsesTraceIdentifier_ForCorrelationId_WhenNotProvided()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
capturedContext = accessor.Context;
return Task.CompletedTask;
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant-123"
});
context.TraceIdentifier = "test-trace-id";
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.CorrelationId.Should().Be("test-trace-id");
}
[Fact]
public async Task InvokeAsync_AddsTenantIdToResponseHeaders()
{
// Arrange
var (middleware, _) = CreateMiddleware();
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "response-tenant"
});
// Trigger OnStarting callbacks by starting the response
await middleware.InvokeAsync(context);
await context.Response.StartAsync();
// Assert
context.Response.Headers["X-Tenant-Id"].ToString().Should().Be("response-tenant");
}
[Fact]
public async Task InvokeAsync_ClearsContextAfterRequest()
{
// Arrange
var (middleware, accessor) = CreateMiddleware();
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant-123"
});
// Act
await middleware.InvokeAsync(context);
// Assert
accessor.HasContext.Should().BeFalse();
accessor.Context.Should().BeNull();
}
[Fact]
public async Task InvokeAsync_AllowsHyphenAndUnderscore_InTenantId()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
capturedContext = accessor.Context;
return Task.CompletedTask;
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant-123_abc"
});
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.TenantId.Should().Be("tenant-123_abc");
}
[Fact]
public async Task InvokeAsync_SetsSource_ToHttpHeader()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
capturedContext = accessor.Context;
return Task.CompletedTask;
});
var context = CreateHttpContext(headers: new Dictionary<string, string>
{
["X-StellaOps-Tenant"] = "tenant-123"
});
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.Source.Should().Be(TenantContextSource.HttpHeader);
}
[Fact]
public async Task InvokeAsync_SetsSource_ToQueryParameter_ForWebSocket()
{
// Arrange
TenantContext? capturedContext = null;
var (middleware, accessor) = CreateMiddleware(next: ctx =>
{
capturedContext = accessor.Context;
return Task.CompletedTask;
});
var context = CreateHttpContext(
path: "/api/live",
query: new Dictionary<string, string> { ["tenant"] = "ws-tenant" });
// Act
await middleware.InvokeAsync(context);
// Assert
capturedContext.Should().NotBeNull();
capturedContext!.Source.Should().Be(TenantContextSource.QueryParameter);
}
}
public sealed class TenantMiddlewareOptionsTests
{
[Fact]
public void DefaultValues_AreCorrect()
{
// Arrange & Act
var options = new TenantMiddlewareOptions();
// Assert
options.TenantHeader.Should().Be("X-StellaOps-Tenant");
options.ActorHeader.Should().Be("X-StellaOps-Actor");
options.CorrelationHeader.Should().Be("X-Correlation-Id");
options.RequireTenant.Should().BeTrue();
options.MinTenantIdLength.Should().Be(1);
options.MaxTenantIdLength.Should().Be(128);
options.ExcludedPaths.Should().Contain("/healthz");
options.ExcludedPaths.Should().Contain("/metrics");
}
}

View File

@@ -0,0 +1,486 @@
using System.Text.Json.Nodes;
using FluentAssertions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Tenancy;
using Xunit;
namespace StellaOps.Notifier.Tests.Tenancy;
public sealed class TenantNotificationEnricherTests
{
private static DefaultTenantNotificationEnricher CreateEnricher(
ITenantContextAccessor? accessor = null,
TenantNotificationEnricherOptions? options = null,
TimeProvider? timeProvider = null)
{
accessor ??= new TenantContextAccessor();
options ??= new TenantNotificationEnricherOptions();
timeProvider ??= TimeProvider.System;
return new DefaultTenantNotificationEnricher(
accessor,
Options.Create(options),
timeProvider);
}
[Fact]
public void Enrich_AddsTenanInfoToPayload()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user@test.com", "corr-456");
var fakeTime = new FakeTimeProvider(new DateTimeOffset(2024, 1, 15, 10, 30, 0, TimeSpan.Zero));
var enricher = CreateEnricher(accessor, timeProvider: fakeTime);
var payload = new JsonObject
{
["eventType"] = "test.event",
["data"] = new JsonObject { ["key"] = "value" }
};
// Act
var result = enricher.Enrich(payload);
// Assert
result.Should().ContainKey("_tenant");
var tenant = result["_tenant"]!.AsObject();
tenant["id"]!.GetValue<string>().Should().Be("tenant-123");
tenant["actor"]!.GetValue<string>().Should().Be("user@test.com");
tenant["correlationId"]!.GetValue<string>().Should().Be("corr-456");
tenant["source"]!.GetValue<string>().Should().Be("HttpHeader");
tenant.Should().ContainKey("timestamp");
}
[Fact]
public void Enrich_ReturnsUnmodifiedPayloadWhenNoContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
var payload = new JsonObject
{
["eventType"] = "test.event"
};
// Act
var result = enricher.Enrich(payload);
// Assert
result.Should().NotContainKey("_tenant");
result["eventType"]!.GetValue<string>().Should().Be("test.event");
}
[Fact]
public void Enrich_SkipsWhenIncludeInPayloadDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var options = new TenantNotificationEnricherOptions { IncludeInPayload = false };
var enricher = CreateEnricher(accessor, options);
var payload = new JsonObject { ["data"] = "test" };
// Act
var result = enricher.Enrich(payload);
// Assert
result.Should().NotContainKey("_tenant");
}
[Fact]
public void Enrich_UsesCustomPropertyName()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var options = new TenantNotificationEnricherOptions { PayloadPropertyName = "tenantContext" };
var enricher = CreateEnricher(accessor, options);
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload);
// Assert
result.Should().ContainKey("tenantContext");
result.Should().NotContainKey("_tenant");
}
[Fact]
public void Enrich_ExcludesActorWhenDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user@test.com", null);
var options = new TenantNotificationEnricherOptions { IncludeActor = false };
var enricher = CreateEnricher(accessor, options);
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload);
// Assert
var tenant = result["_tenant"]!.AsObject();
tenant.Should().NotContainKey("actor");
}
[Fact]
public void Enrich_ExcludesCorrelationIdWhenDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", "corr-123");
var options = new TenantNotificationEnricherOptions { IncludeCorrelationId = false };
var enricher = CreateEnricher(accessor, options);
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload);
// Assert
var tenant = result["_tenant"]!.AsObject();
tenant.Should().NotContainKey("correlationId");
}
[Fact]
public void Enrich_ExcludesTimestampWhenDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var options = new TenantNotificationEnricherOptions { IncludeTimestamp = false };
var enricher = CreateEnricher(accessor, options);
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload);
// Assert
var tenant = result["_tenant"]!.AsObject();
tenant.Should().NotContainKey("timestamp");
}
[Fact]
public void Enrich_IncludesIsSystemForSystemContext()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.System("system-tenant");
var enricher = CreateEnricher(accessor);
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload);
// Assert
var tenant = result["_tenant"]!.AsObject();
tenant["isSystem"]!.GetValue<bool>().Should().BeTrue();
}
[Fact]
public void Enrich_IncludesClaims()
{
// Arrange
var accessor = new TenantContextAccessor();
var context = TenantContext.FromHeaders("tenant-123", "user", null)
.WithClaim("role", "admin")
.WithClaim("department", "engineering");
accessor.Context = context;
var enricher = CreateEnricher(accessor);
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload);
// Assert
var tenant = result["_tenant"]!.AsObject();
tenant.Should().ContainKey("claims");
var claims = tenant["claims"]!.AsObject();
claims["role"]!.GetValue<string>().Should().Be("admin");
claims["department"]!.GetValue<string>().Should().Be("engineering");
}
[Fact]
public void Enrich_WithExplicitContext_UsesProvidedContext()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("accessor-tenant", "accessor-user", null);
var enricher = CreateEnricher(accessor);
var explicitContext = TenantContext.FromHeaders("explicit-tenant", "explicit-user", "explicit-corr");
var payload = new JsonObject();
// Act
var result = enricher.Enrich(payload, explicitContext);
// Assert
var tenant = result["_tenant"]!.AsObject();
tenant["id"]!.GetValue<string>().Should().Be("explicit-tenant");
tenant["actor"]!.GetValue<string>().Should().Be("explicit-user");
}
[Fact]
public void CreateHeaders_ReturnsEmptyWhenNoContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
// Act
var headers = enricher.CreateHeaders();
// Assert
headers.Should().BeEmpty();
}
[Fact]
public void CreateHeaders_ReturnsTenantHeaders()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user@test.com", "corr-456");
var enricher = CreateEnricher(accessor);
// Act
var headers = enricher.CreateHeaders();
// Assert
headers.Should().ContainKey("X-StellaOps-Tenant");
headers["X-StellaOps-Tenant"].Should().Be("tenant-123");
headers.Should().ContainKey("X-StellaOps-Actor");
headers["X-StellaOps-Actor"].Should().Be("user@test.com");
headers.Should().ContainKey("X-Correlation-Id");
headers["X-Correlation-Id"].Should().Be("corr-456");
}
[Fact]
public void CreateHeaders_ReturnsEmptyWhenIncludeInHeadersDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var options = new TenantNotificationEnricherOptions { IncludeInHeaders = false };
var enricher = CreateEnricher(accessor, options);
// Act
var headers = enricher.CreateHeaders();
// Assert
headers.Should().BeEmpty();
}
[Fact]
public void CreateHeaders_UsesCustomHeaderNames()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var options = new TenantNotificationEnricherOptions
{
TenantHeader = "X-Custom-Tenant",
ActorHeader = "X-Custom-Actor"
};
var enricher = CreateEnricher(accessor, options);
// Act
var headers = enricher.CreateHeaders();
// Assert
headers.Should().ContainKey("X-Custom-Tenant");
headers.Should().ContainKey("X-Custom-Actor");
}
[Fact]
public void ExtractContext_ReturnsNullForNullPayload()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
// Act
var context = enricher.ExtractContext(null!);
// Assert
context.Should().BeNull();
}
[Fact]
public void ExtractContext_ReturnsNullForMissingTenantProperty()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
var payload = new JsonObject { ["data"] = "test" };
// Act
var context = enricher.ExtractContext(payload);
// Assert
context.Should().BeNull();
}
[Fact]
public void ExtractContext_ExtractsValidContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
var payload = new JsonObject
{
["_tenant"] = new JsonObject
{
["id"] = "extracted-tenant",
["actor"] = "extracted-user",
["correlationId"] = "extracted-corr",
["source"] = "HttpHeader"
}
};
// Act
var context = enricher.ExtractContext(payload);
// Assert
context.Should().NotBeNull();
context!.TenantId.Should().Be("extracted-tenant");
context.Actor.Should().Be("extracted-user");
context.CorrelationId.Should().Be("extracted-corr");
context.Source.Should().Be(TenantContextSource.HttpHeader);
}
[Fact]
public void ExtractContext_ExtractsSystemContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
var payload = new JsonObject
{
["_tenant"] = new JsonObject
{
["id"] = "system",
["actor"] = "system",
["isSystem"] = true,
["source"] = "System"
}
};
// Act
var context = enricher.ExtractContext(payload);
// Assert
context.Should().NotBeNull();
context!.IsSystemContext.Should().BeTrue();
context.Source.Should().Be(TenantContextSource.System);
}
[Fact]
public void ExtractContext_ExtractsClaims()
{
// Arrange
var accessor = new TenantContextAccessor();
var enricher = CreateEnricher(accessor);
var payload = new JsonObject
{
["_tenant"] = new JsonObject
{
["id"] = "tenant-123",
["claims"] = new JsonObject
{
["role"] = "admin",
["tier"] = "premium"
}
}
};
// Act
var context = enricher.ExtractContext(payload);
// Assert
context.Should().NotBeNull();
context!.Claims.Should().HaveCount(2);
context.Claims["role"].Should().Be("admin");
context.Claims["tier"].Should().Be("premium");
}
[Fact]
public void ExtractContext_UsesCustomPropertyName()
{
// Arrange
var accessor = new TenantContextAccessor();
var options = new TenantNotificationEnricherOptions { PayloadPropertyName = "tenantInfo" };
var enricher = CreateEnricher(accessor, options);
var payload = new JsonObject
{
["tenantInfo"] = new JsonObject
{
["id"] = "custom-tenant"
}
};
// Act
var context = enricher.ExtractContext(payload);
// Assert
context.Should().NotBeNull();
context!.TenantId.Should().Be("custom-tenant");
}
}
public sealed class TenantNotificationEnricherExtensionsTests
{
[Fact]
public void EnrichFromDictionary_CreatesEnrichedPayload()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var enricher = new DefaultTenantNotificationEnricher(
accessor,
Options.Create(new TenantNotificationEnricherOptions()),
TimeProvider.System);
var data = new Dictionary<string, object?>
{
["eventType"] = "test.event",
["value"] = 42
};
// Act
var result = enricher.EnrichFromDictionary(data);
// Assert
result.Should().ContainKey("eventType");
result.Should().ContainKey("value");
result.Should().ContainKey("_tenant");
}
[Fact]
public void EnrichAndSerialize_ReturnsJsonString()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-123", "user", null);
var enricher = new DefaultTenantNotificationEnricher(
accessor,
Options.Create(new TenantNotificationEnricherOptions()),
TimeProvider.System);
var payload = new JsonObject { ["data"] = "test" };
// Act
var result = enricher.EnrichAndSerialize(payload);
// Assert
result.Should().Contain("\"_tenant\"");
result.Should().Contain("\"tenant-123\"");
}
}

View File

@@ -0,0 +1,367 @@
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using StellaOps.Notifier.Worker.Tenancy;
using Xunit;
namespace StellaOps.Notifier.Tests.Tenancy;
public sealed class TenantRlsEnforcerTests
{
private static DefaultTenantRlsEnforcer CreateEnforcer(
ITenantContextAccessor? accessor = null,
TenantRlsOptions? options = null)
{
accessor ??= new TenantContextAccessor();
options ??= new TenantRlsOptions();
return new DefaultTenantRlsEnforcer(
accessor,
Options.Create(options),
NullLogger<DefaultTenantRlsEnforcer>.Instance);
}
[Fact]
public async Task ValidateAccessAsync_AllowsSameTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-a", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeTrue();
result.TenantId.Should().Be("tenant-a");
result.ResourceTenantId.Should().Be("tenant-a");
result.IsSystemAccess.Should().BeFalse();
}
[Fact]
public async Task ValidateAccessAsync_DeniesDifferentTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-b", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeFalse();
result.DenialReason.Should().Contain("tenant-a");
result.DenialReason.Should().Contain("tenant-b");
}
[Fact]
public async Task ValidateAccessAsync_AllowsSystemContextBypass()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.System("system");
var options = new TenantRlsOptions { AllowSystemBypass = true };
var enforcer = CreateEnforcer(accessor, options);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-b", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeTrue();
result.IsSystemAccess.Should().BeTrue();
}
[Fact]
public async Task ValidateAccessAsync_DeniesSystemContextWhenBypassDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.System("system");
var options = new TenantRlsOptions { AllowSystemBypass = false };
var enforcer = CreateEnforcer(accessor, options);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-b", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeFalse();
}
[Fact]
public async Task ValidateAccessAsync_AllowsAdminTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("admin", "admin-user", null);
var options = new TenantRlsOptions { AdminTenantPatterns = ["^admin$"] };
var enforcer = CreateEnforcer(accessor, options);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-b", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeTrue();
result.IsSystemAccess.Should().BeTrue();
}
[Fact]
public async Task ValidateAccessAsync_AllowsGlobalResourceTypes()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantRlsOptions { GlobalResourceTypes = ["system-template"] };
var enforcer = CreateEnforcer(accessor, options);
// Act
var result = await enforcer.ValidateAccessAsync(
"system-template", "template-123", "system", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeTrue();
}
[Fact]
public async Task ValidateAccessAsync_AllowsAllWhenDisabled()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var options = new TenantRlsOptions { Enabled = false };
var enforcer = CreateEnforcer(accessor, options);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-b", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeTrue();
}
[Fact]
public async Task ValidateAccessAsync_DeniesWhenNoContext()
{
// Arrange
var accessor = new TenantContextAccessor();
var enforcer = CreateEnforcer(accessor);
// Act
var result = await enforcer.ValidateAccessAsync(
"notification", "notif-123", "tenant-a", RlsOperation.Read);
// Assert
result.IsAllowed.Should().BeFalse();
result.DenialReason.Should().Contain("context");
}
[Fact]
public async Task EnsureAccessAsync_ThrowsOnDenial()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-a", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var act = async () => await enforcer.EnsureAccessAsync(
"notification", "notif-123", "tenant-b", RlsOperation.Update);
// Assert
await act.Should().ThrowAsync<TenantAccessDeniedException>()
.Where(ex => ex.TenantId == "tenant-a" &&
ex.ResourceTenantId == "tenant-b" &&
ex.ResourceType == "notification" &&
ex.Operation == RlsOperation.Update);
}
[Fact]
public void GetCurrentTenantId_ReturnsCurrentTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-xyz", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var tenantId = enforcer.GetCurrentTenantId();
// Assert
tenantId.Should().Be("tenant-xyz");
}
[Fact]
public void HasSystemAccess_ReturnsTrueForSystemContext()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.System("system");
var enforcer = CreateEnforcer(accessor);
// Act
var hasAccess = enforcer.HasSystemAccess();
// Assert
hasAccess.Should().BeTrue();
}
[Fact]
public void HasSystemAccess_ReturnsFalseForRegularContext()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("regular-tenant", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var hasAccess = enforcer.HasSystemAccess();
// Assert
hasAccess.Should().BeFalse();
}
[Fact]
public void CreateScopedId_CreatesTenantPrefixedId()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-abc", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var scopedId = enforcer.CreateScopedId("resource-123");
// Assert
scopedId.Should().Be("tenant-abc:resource-123");
}
[Fact]
public void ExtractResourceId_ExtractsResourcePart()
{
// Arrange
var accessor = new TenantContextAccessor();
var enforcer = CreateEnforcer(accessor);
// Act
var resourceId = enforcer.ExtractResourceId("tenant-abc:resource-123");
// Assert
resourceId.Should().Be("resource-123");
}
[Fact]
public void ExtractResourceId_ReturnsNullForInvalidFormat()
{
// Arrange
var accessor = new TenantContextAccessor();
var enforcer = CreateEnforcer(accessor);
// Act
var resourceId = enforcer.ExtractResourceId("no-separator-here");
// Assert
resourceId.Should().BeNull();
}
[Fact]
public void ValidateScopedId_ReturnsTrueForSameTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-abc", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var isValid = enforcer.ValidateScopedId("tenant-abc:resource-123");
// Assert
isValid.Should().BeTrue();
}
[Fact]
public void ValidateScopedId_ReturnsFalseForDifferentTenant()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.FromHeaders("tenant-abc", "user", null);
var enforcer = CreateEnforcer(accessor);
// Act
var isValid = enforcer.ValidateScopedId("tenant-xyz:resource-123");
// Assert
isValid.Should().BeFalse();
}
[Fact]
public void ValidateScopedId_ReturnsTrueForSystemAccess()
{
// Arrange
var accessor = new TenantContextAccessor();
accessor.Context = TenantContext.System("system");
var options = new TenantRlsOptions { AdminTenantPatterns = ["^system$"] };
var enforcer = CreateEnforcer(accessor, options);
// Act
var isValid = enforcer.ValidateScopedId("any-tenant:resource-123");
// Assert
isValid.Should().BeTrue();
}
}
public sealed class RlsValidationResultTests
{
[Fact]
public void Allowed_CreatesAllowedResult()
{
// Arrange & Act
var result = RlsValidationResult.Allowed("tenant-a", "tenant-a");
// Assert
result.IsAllowed.Should().BeTrue();
result.TenantId.Should().Be("tenant-a");
result.ResourceTenantId.Should().Be("tenant-a");
result.DenialReason.Should().BeNull();
}
[Fact]
public void Denied_CreatesDeniedResult()
{
// Arrange & Act
var result = RlsValidationResult.Denied("tenant-a", "tenant-b", "Cross-tenant access denied");
// Assert
result.IsAllowed.Should().BeFalse();
result.TenantId.Should().Be("tenant-a");
result.ResourceTenantId.Should().Be("tenant-b");
result.DenialReason.Should().Be("Cross-tenant access denied");
}
}
public sealed class TenantAccessDeniedExceptionTests
{
[Fact]
public void Constructor_SetsAllProperties()
{
// Arrange & Act
var exception = new TenantAccessDeniedException(
"tenant-a", "tenant-b", "notification", "notif-123", RlsOperation.Update);
// Assert
exception.TenantId.Should().Be("tenant-a");
exception.ResourceTenantId.Should().Be("tenant-b");
exception.ResourceType.Should().Be("notification");
exception.ResourceId.Should().Be("notif-123");
exception.Operation.Should().Be(RlsOperation.Update);
exception.Message.Should().Contain("tenant-a");
exception.Message.Should().Contain("tenant-b");
exception.Message.Should().Contain("notification/notif-123");
}
}

View File

@@ -0,0 +1,330 @@
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notifier.Worker.Correlation;
using StellaOps.Notifier.Worker.Escalation;
using Xunit;
namespace StellaOps.Notifier.WebService.Tests.Escalation;
/// <summary>
/// Tests for acknowledgment bridge.
/// </summary>
public sealed class AckBridgeTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly Mock<IEscalationEngine> _escalationEngine;
private readonly Mock<IIncidentManager> _incidentManager;
private readonly AckBridgeOptions _options;
private readonly AckBridge _bridge;
public AckBridgeTests()
{
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2025, 1, 15, 10, 0, 0, TimeSpan.Zero));
_escalationEngine = new Mock<IEscalationEngine>();
_incidentManager = new Mock<IIncidentManager>();
_options = new AckBridgeOptions
{
AckBaseUrl = "https://notify.example.com",
SigningKey = "test-signing-key-for-unit-tests",
DefaultTokenExpiry = TimeSpan.FromHours(24)
};
_bridge = new AckBridge(
_escalationEngine.Object,
_incidentManager.Object,
null,
Options.Create(_options),
_timeProvider,
NullLogger<AckBridge>.Instance);
}
[Fact]
public async Task GenerateAckLink_CreatesValidUrl()
{
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1",
TimeSpan.FromHours(1));
link.Should().StartWith("https://notify.example.com/ack?token=");
link.Should().Contain("token=");
}
[Fact]
public async Task ValidateToken_WithValidToken_ReturnsValid()
{
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1",
TimeSpan.FromHours(1));
var token = ExtractToken(link);
var result = await _bridge.ValidateTokenAsync(token);
result.IsValid.Should().BeTrue();
result.TenantId.Should().Be("tenant-1");
result.IncidentId.Should().Be("incident-1");
result.TargetId.Should().Be("user-1");
}
[Fact]
public async Task ValidateToken_WithExpiredToken_ReturnsInvalid()
{
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1",
TimeSpan.FromMinutes(30));
var token = ExtractToken(link);
// Advance time past expiry
_timeProvider.Advance(TimeSpan.FromHours(1));
var result = await _bridge.ValidateTokenAsync(token);
result.IsValid.Should().BeFalse();
result.Error.Should().Contain("expired");
}
[Fact]
public async Task ValidateToken_WithTamperedToken_ReturnsInvalid()
{
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1");
var token = ExtractToken(link);
var tamperedToken = token.Substring(0, token.Length - 5) + "XXXXX";
var result = await _bridge.ValidateTokenAsync(tamperedToken);
result.IsValid.Should().BeFalse();
result.Error.Should().NotBeNullOrEmpty();
}
[Fact]
public async Task ValidateToken_WithMalformedToken_ReturnsInvalid()
{
var result = await _bridge.ValidateTokenAsync("not-a-valid-token");
result.IsValid.Should().BeFalse();
result.Error.Should().Contain("Invalid token format");
}
[Fact]
public async Task ProcessAck_WithSignedLink_ProcessesSuccessfully()
{
var escalationState = new EscalationState
{
TenantId = "tenant-1",
IncidentId = "incident-1",
PolicyId = "policy-1",
Status = EscalationStatus.Acknowledged
};
_escalationEngine.Setup(x => x.ProcessAcknowledgmentAsync(
"tenant-1", "incident-1", "user-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(escalationState);
_incidentManager.Setup(x => x.AcknowledgeAsync(
"tenant-1", "incident-1", "user-1", It.IsAny<CancellationToken>()))
.Returns(Task.CompletedTask);
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1");
var token = ExtractToken(link);
var request = new AckBridgeRequest
{
Source = AckSource.SignedLink,
Token = token,
AcknowledgedBy = "user-1"
};
var result = await _bridge.ProcessAckAsync(request);
result.Success.Should().BeTrue();
result.TenantId.Should().Be("tenant-1");
result.IncidentId.Should().Be("incident-1");
}
[Fact]
public async Task ProcessAck_WithInvalidToken_ReturnsFailed()
{
var request = new AckBridgeRequest
{
Source = AckSource.SignedLink,
Token = "invalid-token",
AcknowledgedBy = "user-1"
};
var result = await _bridge.ProcessAckAsync(request);
result.Success.Should().BeFalse();
result.Error.Should().NotBeNullOrEmpty();
}
[Fact]
public async Task ProcessAck_WithDirectIds_ProcessesSuccessfully()
{
var escalationState = new EscalationState
{
TenantId = "tenant-1",
IncidentId = "incident-1",
PolicyId = "policy-1",
Status = EscalationStatus.Acknowledged
};
_escalationEngine.Setup(x => x.ProcessAcknowledgmentAsync(
"tenant-1", "incident-1", "user-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(escalationState);
_incidentManager.Setup(x => x.AcknowledgeAsync(
"tenant-1", "incident-1", "user-1", It.IsAny<CancellationToken>()))
.Returns(Task.CompletedTask);
var request = new AckBridgeRequest
{
Source = AckSource.Api,
TenantId = "tenant-1",
IncidentId = "incident-1",
AcknowledgedBy = "user-1"
};
var result = await _bridge.ProcessAckAsync(request);
result.Success.Should().BeTrue();
}
[Fact]
public async Task ProcessAck_WithExternalId_ResolvesMapping()
{
var escalationState = new EscalationState
{
TenantId = "tenant-1",
IncidentId = "incident-1",
PolicyId = "policy-1",
Status = EscalationStatus.Acknowledged
};
_escalationEngine.Setup(x => x.ProcessAcknowledgmentAsync(
"tenant-1", "incident-1", "pagerduty", It.IsAny<CancellationToken>()))
.ReturnsAsync(escalationState);
_incidentManager.Setup(x => x.AcknowledgeAsync(
"tenant-1", "incident-1", "pagerduty", It.IsAny<CancellationToken>()))
.Returns(Task.CompletedTask);
// Register the external ID mapping
_bridge.RegisterExternalId(AckSource.PagerDuty, "pd-alert-123", "tenant-1", "incident-1");
var request = new AckBridgeRequest
{
Source = AckSource.PagerDuty,
ExternalId = "pd-alert-123",
AcknowledgedBy = "pagerduty"
};
var result = await _bridge.ProcessAckAsync(request);
result.Success.Should().BeTrue();
result.TenantId.Should().Be("tenant-1");
result.IncidentId.Should().Be("incident-1");
}
[Fact]
public async Task ProcessAck_WithUnknownExternalId_ReturnsFailed()
{
var request = new AckBridgeRequest
{
Source = AckSource.PagerDuty,
ExternalId = "unknown-external-id",
AcknowledgedBy = "pagerduty"
};
var result = await _bridge.ProcessAckAsync(request);
result.Success.Should().BeFalse();
result.Error.Should().Contain("Unknown external ID");
}
[Fact]
public async Task ProcessAck_WithMissingIds_ReturnsFailed()
{
var request = new AckBridgeRequest
{
Source = AckSource.Api,
AcknowledgedBy = "user-1"
};
var result = await _bridge.ProcessAckAsync(request);
result.Success.Should().BeFalse();
result.Error.Should().Contain("Could not resolve");
}
[Fact]
public async Task GenerateAckLink_UsesDefaultExpiry()
{
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1");
var token = ExtractToken(link);
var result = await _bridge.ValidateTokenAsync(token);
result.IsValid.Should().BeTrue();
result.ExpiresAt.Should().BeCloseTo(
_timeProvider.GetUtcNow().Add(_options.DefaultTokenExpiry),
TimeSpan.FromSeconds(1));
}
[Fact]
public void RegisterExternalId_AllowsMultipleMappings()
{
_bridge.RegisterExternalId(AckSource.PagerDuty, "pd-1", "tenant-1", "incident-1");
_bridge.RegisterExternalId(AckSource.OpsGenie, "og-1", "tenant-1", "incident-2");
_bridge.RegisterExternalId(AckSource.PagerDuty, "pd-2", "tenant-2", "incident-3");
// Verify by trying to resolve (indirectly through ProcessAckAsync)
// This is validated by the ProcessAck_WithExternalId_ResolvesMapping test
}
[Fact]
public async Task ValidateToken_ReturnsExpiresAt()
{
var expiry = TimeSpan.FromHours(2);
var link = await _bridge.GenerateAckLinkAsync(
"tenant-1",
"incident-1",
"user-1",
expiry);
var token = ExtractToken(link);
var result = await _bridge.ValidateTokenAsync(token);
result.IsValid.Should().BeTrue();
result.ExpiresAt.Should().BeCloseTo(
_timeProvider.GetUtcNow().Add(expiry),
TimeSpan.FromSeconds(1));
}
private static string ExtractToken(string link)
{
var uri = new Uri(link);
var query = System.Web.HttpUtility.ParseQueryString(uri.Query);
return query["token"] ?? throw new InvalidOperationException("Token not found in URL");
}
}

View File

@@ -0,0 +1,317 @@
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Time.Testing;
using Moq;
using StellaOps.Notifier.Worker.Correlation;
using StellaOps.Notifier.Worker.Escalation;
using Xunit;
namespace StellaOps.Notifier.WebService.Tests.Escalation;
/// <summary>
/// Tests for escalation engine.
/// </summary>
public sealed class EscalationEngineTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly Mock<IEscalationPolicyService> _policyService;
private readonly Mock<IOnCallScheduleService> _scheduleService;
private readonly Mock<IIncidentManager> _incidentManager;
private readonly EscalationEngine _engine;
public EscalationEngineTests()
{
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2025, 1, 15, 10, 0, 0, TimeSpan.Zero));
_policyService = new Mock<IEscalationPolicyService>();
_scheduleService = new Mock<IOnCallScheduleService>();
_incidentManager = new Mock<IIncidentManager>();
_engine = new EscalationEngine(
_policyService.Object,
_scheduleService.Object,
_incidentManager.Object,
null,
_timeProvider,
NullLogger<EscalationEngine>.Instance);
}
[Fact]
public async Task StartEscalation_WithValidPolicy_ReturnsStartedState()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
var result = await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
result.Should().NotBeNull();
result.TenantId.Should().Be("tenant-1");
result.IncidentId.Should().Be("incident-1");
result.PolicyId.Should().Be("policy-1");
result.Status.Should().Be(EscalationStatus.InProgress);
result.CurrentLevel.Should().Be(1);
}
[Fact]
public async Task StartEscalation_WithNonexistentPolicy_ReturnsFailedState()
{
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "nonexistent", It.IsAny<CancellationToken>()))
.ReturnsAsync((EscalationPolicy?)null);
var result = await _engine.StartEscalationAsync("tenant-1", "incident-1", "nonexistent");
result.Status.Should().Be(EscalationStatus.Failed);
result.ErrorMessage.Should().Contain("not found");
}
[Fact]
public async Task GetEscalationState_AfterStart_ReturnsState()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
var result = await _engine.GetEscalationStateAsync("tenant-1", "incident-1");
result.Should().NotBeNull();
result!.IncidentId.Should().Be("incident-1");
}
[Fact]
public async Task GetEscalationState_WhenNotStarted_ReturnsNull()
{
var result = await _engine.GetEscalationStateAsync("tenant-1", "nonexistent");
result.Should().BeNull();
}
[Fact]
public async Task ProcessAcknowledgment_StopsEscalation()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
var result = await _engine.ProcessAcknowledgmentAsync("tenant-1", "incident-1", "user-1");
result.Should().NotBeNull();
result!.Status.Should().Be(EscalationStatus.Acknowledged);
result.AcknowledgedBy.Should().Be("user-1");
result.AcknowledgedAt.Should().Be(_timeProvider.GetUtcNow());
}
[Fact]
public async Task ProcessAcknowledgment_WhenNotFound_ReturnsNull()
{
var result = await _engine.ProcessAcknowledgmentAsync("tenant-1", "nonexistent", "user-1");
result.Should().BeNull();
}
[Fact]
public async Task Escalate_MovesToNextLevel()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
var result = await _engine.EscalateAsync("tenant-1", "incident-1", "Manual escalation", "admin");
result.Should().NotBeNull();
result!.CurrentLevel.Should().Be(2);
}
[Fact]
public async Task Escalate_AtMaxLevel_StartsNewCycle()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
await _engine.EscalateAsync("tenant-1", "incident-1", "First escalation", "admin");
var result = await _engine.EscalateAsync("tenant-1", "incident-1", "Second escalation", "admin");
result.Should().NotBeNull();
result!.CurrentLevel.Should().Be(1);
result.CycleCount.Should().Be(2);
}
[Fact]
public async Task StopEscalation_SetsResolvedStatus()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
var result = await _engine.StopEscalationAsync("tenant-1", "incident-1", "Incident resolved", "admin");
result.Should().BeTrue();
var state = await _engine.GetEscalationStateAsync("tenant-1", "incident-1");
state!.Status.Should().Be(EscalationStatus.Resolved);
}
[Fact]
public async Task ListActiveEscalations_ReturnsOnlyInProgress()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
await _engine.StartEscalationAsync("tenant-1", "incident-2", "policy-1");
await _engine.ProcessAcknowledgmentAsync("tenant-1", "incident-1", "user-1");
var result = await _engine.ListActiveEscalationsAsync("tenant-1");
result.Should().HaveCount(1);
result[0].IncidentId.Should().Be("incident-2");
}
[Fact]
public async Task ProcessPendingEscalations_EscalatesOverdueItems()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
// Advance time past the escalation delay (5 minutes)
_timeProvider.Advance(TimeSpan.FromMinutes(6));
var actions = await _engine.ProcessPendingEscalationsAsync();
actions.Should().NotBeEmpty();
actions[0].ActionType.Should().Be("Escalate");
actions[0].IncidentId.Should().Be("incident-1");
}
[Fact]
public async Task ProcessPendingEscalations_DoesNotEscalateBeforeDelay()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
// Advance time but not past the delay (5 minutes)
_timeProvider.Advance(TimeSpan.FromMinutes(3));
var actions = await _engine.ProcessPendingEscalationsAsync();
actions.Should().BeEmpty();
}
[Fact]
public async Task StartEscalation_ResolvesOnCallTargets()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([
new OnCallUser { UserId = "user-1", UserName = "User One", Email = "user1@example.com" },
new OnCallUser { UserId = "user-2", UserName = "User Two", Email = "user2@example.com" }
]);
var result = await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
result.ResolvedTargets.Should().HaveCount(2);
result.ResolvedTargets.Should().Contain(t => t.UserId == "user-1");
result.ResolvedTargets.Should().Contain(t => t.UserId == "user-2");
}
[Fact]
public async Task StartEscalation_RecordsHistoryEntry()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
_policyService.Setup(x => x.GetPolicyAsync("tenant-1", "policy-1", It.IsAny<CancellationToken>()))
.ReturnsAsync(policy);
_scheduleService.Setup(x => x.GetOnCallAsync("tenant-1", "schedule-1", It.IsAny<DateTimeOffset?>(), It.IsAny<CancellationToken>()))
.ReturnsAsync([new OnCallUser { UserId = "user-1", UserName = "Test User" }]);
var result = await _engine.StartEscalationAsync("tenant-1", "incident-1", "policy-1");
result.History.Should().HaveCount(1);
result.History[0].Action.Should().Be("Started");
result.History[0].Level.Should().Be(1);
}
private static EscalationPolicy CreateTestPolicy(string tenantId, string policyId) => new()
{
PolicyId = policyId,
TenantId = tenantId,
Name = "Default Escalation",
IsDefault = true,
Levels =
[
new EscalationLevel
{
Order = 1,
DelayMinutes = 5,
Targets =
[
new EscalationTarget
{
TargetType = EscalationTargetType.OnCallSchedule,
TargetId = "schedule-1"
}
]
},
new EscalationLevel
{
Order = 2,
DelayMinutes = 15,
Targets =
[
new EscalationTarget
{
TargetType = EscalationTargetType.User,
TargetId = "manager-1"
}
]
}
]
};
}

View File

@@ -0,0 +1,254 @@
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Escalation;
using Xunit;
namespace StellaOps.Notifier.WebService.Tests.Escalation;
/// <summary>
/// Tests for escalation policy service.
/// </summary>
public sealed class EscalationPolicyServiceTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly InMemoryEscalationPolicyService _service;
public EscalationPolicyServiceTests()
{
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2025, 1, 15, 10, 0, 0, TimeSpan.Zero));
_service = new InMemoryEscalationPolicyService(
null,
_timeProvider,
NullLogger<InMemoryEscalationPolicyService>.Instance);
}
[Fact]
public async Task ListPolicies_WhenEmpty_ReturnsEmptyList()
{
var result = await _service.ListPoliciesAsync("tenant-1");
result.Should().BeEmpty();
}
[Fact]
public async Task UpsertPolicy_CreatesNewPolicy()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
var result = await _service.UpsertPolicyAsync(policy, "admin");
result.PolicyId.Should().Be("policy-1");
result.TenantId.Should().Be("tenant-1");
result.Name.Should().Be("Default Escalation");
result.Levels.Should().HaveCount(2);
}
[Fact]
public async Task UpsertPolicy_UpdatesExistingPolicy()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
await _service.UpsertPolicyAsync(policy, "admin");
var updated = policy with { Name = "Updated Policy" };
var result = await _service.UpsertPolicyAsync(updated, "admin");
result.Name.Should().Be("Updated Policy");
var retrieved = await _service.GetPolicyAsync("tenant-1", "policy-1");
retrieved!.Name.Should().Be("Updated Policy");
}
[Fact]
public async Task GetPolicy_WhenExists_ReturnsPolicy()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
await _service.UpsertPolicyAsync(policy, "admin");
var result = await _service.GetPolicyAsync("tenant-1", "policy-1");
result.Should().NotBeNull();
result!.PolicyId.Should().Be("policy-1");
}
[Fact]
public async Task GetPolicy_WhenNotExists_ReturnsNull()
{
var result = await _service.GetPolicyAsync("tenant-1", "nonexistent");
result.Should().BeNull();
}
[Fact]
public async Task DeletePolicy_WhenExists_ReturnsTrue()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
await _service.UpsertPolicyAsync(policy, "admin");
var result = await _service.DeletePolicyAsync("tenant-1", "policy-1", "admin");
result.Should().BeTrue();
var retrieved = await _service.GetPolicyAsync("tenant-1", "policy-1");
retrieved.Should().BeNull();
}
[Fact]
public async Task DeletePolicy_WhenNotExists_ReturnsFalse()
{
var result = await _service.DeletePolicyAsync("tenant-1", "nonexistent", "admin");
result.Should().BeFalse();
}
[Fact]
public async Task GetDefaultPolicy_ReturnsFirstDefaultPolicy()
{
var policy1 = CreateTestPolicy("tenant-1", "policy-1") with { IsDefault = false };
var policy2 = CreateTestPolicy("tenant-1", "policy-2") with { IsDefault = true };
var policy3 = CreateTestPolicy("tenant-1", "policy-3") with { IsDefault = true };
await _service.UpsertPolicyAsync(policy1, "admin");
await _service.UpsertPolicyAsync(policy2, "admin");
await _service.UpsertPolicyAsync(policy3, "admin");
var result = await _service.GetDefaultPolicyAsync("tenant-1");
result.Should().NotBeNull();
result!.IsDefault.Should().BeTrue();
}
[Fact]
public async Task GetDefaultPolicy_WhenNoneDefault_ReturnsNull()
{
var policy = CreateTestPolicy("tenant-1", "policy-1") with { IsDefault = false };
await _service.UpsertPolicyAsync(policy, "admin");
var result = await _service.GetDefaultPolicyAsync("tenant-1");
result.Should().BeNull();
}
[Fact]
public async Task FindMatchingPolicies_FiltersByEventKind()
{
var policy1 = CreateTestPolicy("tenant-1", "policy-1") with
{
EventKindFilter = ["scan.*", "vulnerability.*"]
};
var policy2 = CreateTestPolicy("tenant-1", "policy-2") with
{
EventKindFilter = ["compliance.*"]
};
await _service.UpsertPolicyAsync(policy1, "admin");
await _service.UpsertPolicyAsync(policy2, "admin");
var result = await _service.FindMatchingPoliciesAsync("tenant-1", "scan.completed", null);
result.Should().HaveCount(1);
result[0].PolicyId.Should().Be("policy-1");
}
[Fact]
public async Task FindMatchingPolicies_FiltersBySeverity()
{
var policy1 = CreateTestPolicy("tenant-1", "policy-1") with
{
SeverityFilter = ["critical", "high"]
};
var policy2 = CreateTestPolicy("tenant-1", "policy-2") with
{
SeverityFilter = ["low"]
};
await _service.UpsertPolicyAsync(policy1, "admin");
await _service.UpsertPolicyAsync(policy2, "admin");
var result = await _service.FindMatchingPoliciesAsync("tenant-1", "incident.created", "critical");
result.Should().HaveCount(1);
result[0].PolicyId.Should().Be("policy-1");
}
[Fact]
public async Task FindMatchingPolicies_ReturnsAllWhenNoFilters()
{
var policy1 = CreateTestPolicy("tenant-1", "policy-1");
var policy2 = CreateTestPolicy("tenant-1", "policy-2");
await _service.UpsertPolicyAsync(policy1, "admin");
await _service.UpsertPolicyAsync(policy2, "admin");
var result = await _service.FindMatchingPoliciesAsync("tenant-1", "any.event", null);
result.Should().HaveCount(2);
}
[Fact]
public async Task ListPolicies_IsolatesByTenant()
{
var policy1 = CreateTestPolicy("tenant-1", "policy-1");
var policy2 = CreateTestPolicy("tenant-2", "policy-2");
await _service.UpsertPolicyAsync(policy1, "admin");
await _service.UpsertPolicyAsync(policy2, "admin");
var tenant1Policies = await _service.ListPoliciesAsync("tenant-1");
var tenant2Policies = await _service.ListPoliciesAsync("tenant-2");
tenant1Policies.Should().HaveCount(1);
tenant1Policies[0].PolicyId.Should().Be("policy-1");
tenant2Policies.Should().HaveCount(1);
tenant2Policies[0].PolicyId.Should().Be("policy-2");
}
[Fact]
public async Task UpsertPolicy_SetsTimestamps()
{
var policy = CreateTestPolicy("tenant-1", "policy-1");
var result = await _service.UpsertPolicyAsync(policy, "admin");
result.CreatedAt.Should().Be(_timeProvider.GetUtcNow());
result.UpdatedAt.Should().Be(_timeProvider.GetUtcNow());
}
private static EscalationPolicy CreateTestPolicy(string tenantId, string policyId) => new()
{
PolicyId = policyId,
TenantId = tenantId,
Name = "Default Escalation",
IsDefault = true,
Levels =
[
new EscalationLevel
{
Order = 1,
DelayMinutes = 5,
Targets =
[
new EscalationTarget
{
TargetType = EscalationTargetType.OnCallSchedule,
TargetId = "schedule-1"
}
]
},
new EscalationLevel
{
Order = 2,
DelayMinutes = 15,
Targets =
[
new EscalationTarget
{
TargetType = EscalationTargetType.User,
TargetId = "manager-1"
}
]
}
]
};
}

View File

@@ -0,0 +1,356 @@
using FluentAssertions;
using Microsoft.Extensions.Logging.Abstractions;
using Microsoft.Extensions.Time.Testing;
using StellaOps.Notifier.Worker.Escalation;
using Xunit;
namespace StellaOps.Notifier.WebService.Tests.Escalation;
/// <summary>
/// Tests for inbox channel adapters.
/// </summary>
public sealed class InboxChannelTests
{
private readonly FakeTimeProvider _timeProvider;
private readonly InAppInboxChannel _inboxChannel;
private readonly CliNotificationChannel _cliChannel;
public InboxChannelTests()
{
_timeProvider = new FakeTimeProvider(new DateTimeOffset(2025, 1, 15, 10, 0, 0, TimeSpan.Zero));
_inboxChannel = new InAppInboxChannel(
null,
_timeProvider,
NullLogger<InAppInboxChannel>.Instance);
_cliChannel = new CliNotificationChannel(
_timeProvider,
NullLogger<CliNotificationChannel>.Instance);
}
[Fact]
public async Task InApp_SendAsync_CreatesNotification()
{
var notification = CreateTestNotification("tenant-1", "user-1", "notif-1");
var result = await _inboxChannel.SendAsync(notification);
result.Success.Should().BeTrue();
result.NotificationId.Should().Be("notif-1");
}
[Fact]
public async Task InApp_ListAsync_ReturnsNotifications()
{
var notification = CreateTestNotification("tenant-1", "user-1", "notif-1");
await _inboxChannel.SendAsync(notification);
var result = await _inboxChannel.ListAsync("tenant-1", "user-1");
result.Should().HaveCount(1);
result[0].NotificationId.Should().Be("notif-1");
}
[Fact]
public async Task InApp_ListAsync_FiltersUnread()
{
var notif1 = CreateTestNotification("tenant-1", "user-1", "notif-1");
var notif2 = CreateTestNotification("tenant-1", "user-1", "notif-2");
await _inboxChannel.SendAsync(notif1);
await _inboxChannel.SendAsync(notif2);
await _inboxChannel.MarkReadAsync("tenant-1", "user-1", "notif-1");
var result = await _inboxChannel.ListAsync("tenant-1", "user-1", new InboxQuery { IsRead = false });
result.Should().HaveCount(1);
result[0].NotificationId.Should().Be("notif-2");
}
[Fact]
public async Task InApp_ListAsync_FiltersByType()
{
var incident = CreateTestNotification("tenant-1", "user-1", "notif-1") with
{
Type = InboxNotificationType.Incident
};
var system = CreateTestNotification("tenant-1", "user-1", "notif-2") with
{
Type = InboxNotificationType.System
};
await _inboxChannel.SendAsync(incident);
await _inboxChannel.SendAsync(system);
var result = await _inboxChannel.ListAsync("tenant-1", "user-1",
new InboxQuery { Type = InboxNotificationType.Incident });
result.Should().HaveCount(1);
result[0].NotificationId.Should().Be("notif-1");
}
[Fact]
public async Task InApp_ListAsync_FiltersByMinPriority()
{
var low = CreateTestNotification("tenant-1", "user-1", "notif-1") with { Priority = InboxPriority.Low };
var high = CreateTestNotification("tenant-1", "user-1", "notif-2") with { Priority = InboxPriority.High };
var urgent = CreateTestNotification("tenant-1", "user-1", "notif-3") with { Priority = InboxPriority.Urgent };
await _inboxChannel.SendAsync(low);
await _inboxChannel.SendAsync(high);
await _inboxChannel.SendAsync(urgent);
var result = await _inboxChannel.ListAsync("tenant-1", "user-1",
new InboxQuery { MinPriority = InboxPriority.High });
result.Should().HaveCount(2);
result.Should().OnlyContain(n => n.Priority >= InboxPriority.High);
}
[Fact]
public async Task InApp_ListAsync_ExcludesExpired()
{
var active = CreateTestNotification("tenant-1", "user-1", "notif-1") with
{
ExpiresAt = _timeProvider.GetUtcNow().AddHours(1)
};
var expired = CreateTestNotification("tenant-1", "user-1", "notif-2") with
{
ExpiresAt = _timeProvider.GetUtcNow().AddHours(-1)
};
await _inboxChannel.SendAsync(active);
await _inboxChannel.SendAsync(expired);
var result = await _inboxChannel.ListAsync("tenant-1", "user-1");
result.Should().HaveCount(1);
result[0].NotificationId.Should().Be("notif-1");
}
[Fact]
public async Task InApp_ListAsync_IncludesExpiredWhenRequested()
{
var active = CreateTestNotification("tenant-1", "user-1", "notif-1") with
{
ExpiresAt = _timeProvider.GetUtcNow().AddHours(1)
};
var expired = CreateTestNotification("tenant-1", "user-1", "notif-2") with
{
ExpiresAt = _timeProvider.GetUtcNow().AddHours(-1)
};
await _inboxChannel.SendAsync(active);
await _inboxChannel.SendAsync(expired);
var result = await _inboxChannel.ListAsync("tenant-1", "user-1",
new InboxQuery { IncludeExpired = true });
result.Should().HaveCount(2);
}
[Fact]
public async Task InApp_ListAsync_RespectsLimit()
{
for (int i = 0; i < 10; i++)
{
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", $"notif-{i}"));
}
var result = await _inboxChannel.ListAsync("tenant-1", "user-1",
new InboxQuery { Limit = 5 });
result.Should().HaveCount(5);
}
[Fact]
public async Task InApp_MarkReadAsync_MarksNotificationAsRead()
{
var notification = CreateTestNotification("tenant-1", "user-1", "notif-1");
await _inboxChannel.SendAsync(notification);
var result = await _inboxChannel.MarkReadAsync("tenant-1", "user-1", "notif-1");
result.Should().BeTrue();
var list = await _inboxChannel.ListAsync("tenant-1", "user-1");
list[0].IsRead.Should().BeTrue();
list[0].ReadAt.Should().Be(_timeProvider.GetUtcNow());
}
[Fact]
public async Task InApp_MarkReadAsync_ReturnsFalseForNonexistent()
{
var result = await _inboxChannel.MarkReadAsync("tenant-1", "user-1", "nonexistent");
result.Should().BeFalse();
}
[Fact]
public async Task InApp_MarkAllReadAsync_MarksAllAsRead()
{
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-1"));
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-2"));
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-3"));
var result = await _inboxChannel.MarkAllReadAsync("tenant-1", "user-1");
result.Should().Be(3);
var unread = await _inboxChannel.GetUnreadCountAsync("tenant-1", "user-1");
unread.Should().Be(0);
}
[Fact]
public async Task InApp_GetUnreadCountAsync_ReturnsCorrectCount()
{
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-1"));
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-2"));
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-3"));
await _inboxChannel.MarkReadAsync("tenant-1", "user-1", "notif-1");
var result = await _inboxChannel.GetUnreadCountAsync("tenant-1", "user-1");
result.Should().Be(2);
}
[Fact]
public async Task InApp_GetUnreadCountAsync_ExcludesExpired()
{
var active = CreateTestNotification("tenant-1", "user-1", "notif-1") with
{
ExpiresAt = _timeProvider.GetUtcNow().AddHours(1)
};
var expired = CreateTestNotification("tenant-1", "user-1", "notif-2") with
{
ExpiresAt = _timeProvider.GetUtcNow().AddHours(-1)
};
await _inboxChannel.SendAsync(active);
await _inboxChannel.SendAsync(expired);
var result = await _inboxChannel.GetUnreadCountAsync("tenant-1", "user-1");
result.Should().Be(1);
}
[Fact]
public async Task InApp_IsolatesByTenantAndUser()
{
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-1", "notif-1"));
await _inboxChannel.SendAsync(CreateTestNotification("tenant-1", "user-2", "notif-2"));
await _inboxChannel.SendAsync(CreateTestNotification("tenant-2", "user-1", "notif-3"));
var tenant1User1 = await _inboxChannel.ListAsync("tenant-1", "user-1");
var tenant1User2 = await _inboxChannel.ListAsync("tenant-1", "user-2");
var tenant2User1 = await _inboxChannel.ListAsync("tenant-2", "user-1");
tenant1User1.Should().HaveCount(1).And.Contain(n => n.NotificationId == "notif-1");
tenant1User2.Should().HaveCount(1).And.Contain(n => n.NotificationId == "notif-2");
tenant2User1.Should().HaveCount(1).And.Contain(n => n.NotificationId == "notif-3");
}
[Fact]
public async Task InApp_ListAsync_SortsByPriorityAndCreatedAt()
{
var low = CreateTestNotification("tenant-1", "user-1", "notif-low") with { Priority = InboxPriority.Low };
await _inboxChannel.SendAsync(low);
_timeProvider.Advance(TimeSpan.FromMinutes(1));
var high = CreateTestNotification("tenant-1", "user-1", "notif-high") with { Priority = InboxPriority.High };
await _inboxChannel.SendAsync(high);
_timeProvider.Advance(TimeSpan.FromMinutes(1));
var urgent = CreateTestNotification("tenant-1", "user-1", "notif-urgent") with { Priority = InboxPriority.Urgent };
await _inboxChannel.SendAsync(urgent);
var result = await _inboxChannel.ListAsync("tenant-1", "user-1");
result[0].NotificationId.Should().Be("notif-urgent");
result[1].NotificationId.Should().Be("notif-high");
result[2].NotificationId.Should().Be("notif-low");
}
[Fact]
public async Task Cli_SendAsync_CreatesNotification()
{
var notification = CreateTestNotification("tenant-1", "user-1", "notif-1");
var result = await _cliChannel.SendAsync(notification);
result.Success.Should().BeTrue();
result.NotificationId.Should().Be("notif-1");
}
[Fact]
public async Task Cli_ListAsync_ReturnsNotifications()
{
var notification = CreateTestNotification("tenant-1", "user-1", "notif-1");
await _cliChannel.SendAsync(notification);
var result = await _cliChannel.ListAsync("tenant-1", "user-1");
result.Should().HaveCount(1);
}
[Fact]
public void Cli_FormatForCli_FormatsCorrectly()
{
var notification = new InboxNotification
{
NotificationId = "notif-1",
TenantId = "tenant-1",
UserId = "user-1",
Type = InboxNotificationType.Incident,
Title = "Critical Alert",
Body = "Server down",
Priority = InboxPriority.Urgent,
IsRead = false,
CreatedAt = new DateTimeOffset(2025, 1, 15, 10, 0, 0, TimeSpan.Zero)
};
var formatted = CliNotificationChannel.FormatForCli(notification);
formatted.Should().Contain("[!!!]");
formatted.Should().Contain("●");
formatted.Should().Contain("Critical Alert");
formatted.Should().Contain("Server down");
formatted.Should().Contain("2025-01-15");
}
[Fact]
public void Cli_FormatForCli_ShowsReadMarker()
{
var notification = new InboxNotification
{
NotificationId = "notif-1",
TenantId = "tenant-1",
UserId = "user-1",
Type = InboxNotificationType.Incident,
Title = "Alert",
Body = "Details",
Priority = InboxPriority.Normal,
IsRead = true,
CreatedAt = _timeProvider.GetUtcNow()
};
var formatted = CliNotificationChannel.FormatForCli(notification);
formatted.Should().NotContain("●");
formatted.Should().Contain("[*]");
}
private static InboxNotification CreateTestNotification(string tenantId, string userId, string notificationId) => new()
{
NotificationId = notificationId,
TenantId = tenantId,
UserId = userId,
Type = InboxNotificationType.Incident,
Title = "Test Alert",
Body = "This is a test notification",
Priority = InboxPriority.Normal
};
}

View File

@@ -0,0 +1,121 @@
namespace StellaOps.Notifier.WebService.Contracts;
/// <summary>
/// Incident list query parameters.
/// </summary>
public sealed record IncidentListQuery
{
/// <summary>
/// Filter by status (open, acknowledged, resolved).
/// </summary>
public string? Status { get; init; }
/// <summary>
/// Filter by event kind prefix.
/// </summary>
public string? EventKindPrefix { get; init; }
/// <summary>
/// Filter incidents after this timestamp.
/// </summary>
public DateTimeOffset? Since { get; init; }
/// <summary>
/// Filter incidents before this timestamp.
/// </summary>
public DateTimeOffset? Until { get; init; }
/// <summary>
/// Maximum number of results.
/// </summary>
public int? Limit { get; init; }
/// <summary>
/// Cursor for pagination.
/// </summary>
public string? Cursor { get; init; }
}
/// <summary>
/// Incident response DTO.
/// </summary>
public sealed record IncidentResponse
{
public required string IncidentId { get; init; }
public required string TenantId { get; init; }
public required string EventKind { get; init; }
public required string Status { get; init; }
public required string Severity { get; init; }
public required string Title { get; init; }
public string? Description { get; init; }
public required int EventCount { get; init; }
public required DateTimeOffset FirstOccurrence { get; init; }
public required DateTimeOffset LastOccurrence { get; init; }
public string? AcknowledgedBy { get; init; }
public DateTimeOffset? AcknowledgedAt { get; init; }
public string? ResolvedBy { get; init; }
public DateTimeOffset? ResolvedAt { get; init; }
public List<string>? Labels { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
}
/// <summary>
/// Incident list response with pagination.
/// </summary>
public sealed record IncidentListResponse
{
public required List<IncidentResponse> Incidents { get; init; }
public required int TotalCount { get; init; }
public string? NextCursor { get; init; }
}
/// <summary>
/// Request to acknowledge an incident.
/// </summary>
public sealed record IncidentAckRequest
{
/// <summary>
/// Actor performing the acknowledgement.
/// </summary>
public string? Actor { get; init; }
/// <summary>
/// Optional comment.
/// </summary>
public string? Comment { get; init; }
}
/// <summary>
/// Request to resolve an incident.
/// </summary>
public sealed record IncidentResolveRequest
{
/// <summary>
/// Actor resolving the incident.
/// </summary>
public string? Actor { get; init; }
/// <summary>
/// Resolution reason.
/// </summary>
public string? Reason { get; init; }
/// <summary>
/// Optional comment.
/// </summary>
public string? Comment { get; init; }
}
/// <summary>
/// Delivery history item for an incident.
/// </summary>
public sealed record DeliveryHistoryItem
{
public required string DeliveryId { get; init; }
public required string ChannelType { get; init; }
public required string ChannelName { get; init; }
public required string Status { get; init; }
public required DateTimeOffset Timestamp { get; init; }
public string? ErrorMessage { get; init; }
public int Attempts { get; init; }
}

View File

@@ -1,9 +1,35 @@
using System.ComponentModel.DataAnnotations;
using System.Text.Json.Serialization;
namespace StellaOps.Notifier.WebService.Contracts;
/// <summary>
/// Request payload for acknowledging a pack approval decision.
/// </summary>
public sealed class PackApprovalAckRequest
{
/// <summary>
/// Acknowledgement token from the notification.
/// </summary>
[Required]
[JsonPropertyName("ackToken")]
public string AckToken { get; init; } = string.Empty;
/// <summary>
/// Approval decision: "approved" or "rejected".
/// </summary>
[JsonPropertyName("decision")]
public string? Decision { get; init; }
/// <summary>
/// Optional comment for audit trail.
/// </summary>
[JsonPropertyName("comment")]
public string? Comment { get; init; }
/// <summary>
/// Identity acknowledging the approval.
/// </summary>
[JsonPropertyName("actor")]
public string? Actor { get; init; }
}

View File

@@ -2,44 +2,87 @@ using System.Text.Json.Serialization;
namespace StellaOps.Notifier.WebService.Contracts;
/// <summary>
/// Request payload for pack approval events from Task Runner.
/// See: docs/notifications/pack-approvals-contract.md
/// </summary>
public sealed class PackApprovalRequest
{
/// <summary>
/// Unique event identifier for deduplication.
/// </summary>
[JsonPropertyName("eventId")]
public Guid EventId { get; init; }
/// <summary>
/// Event timestamp in UTC (ISO 8601).
/// </summary>
[JsonPropertyName("issuedAt")]
public DateTimeOffset IssuedAt { get; init; }
/// <summary>
/// Event type: pack.approval.requested, pack.approval.updated, pack.policy.hold, pack.policy.released.
/// </summary>
[JsonPropertyName("kind")]
public string Kind { get; init; } = string.Empty;
/// <summary>
/// Package identifier in PURL format.
/// </summary>
[JsonPropertyName("packId")]
public string PackId { get; init; } = string.Empty;
/// <summary>
/// Policy metadata (id and version).
/// </summary>
[JsonPropertyName("policy")]
public PackApprovalPolicy? Policy { get; init; }
/// <summary>
/// Current approval state: pending, approved, rejected, hold, expired.
/// </summary>
[JsonPropertyName("decision")]
public string Decision { get; init; } = string.Empty;
/// <summary>
/// Identity that triggered the event.
/// </summary>
[JsonPropertyName("actor")]
public string Actor { get; init; } = string.Empty;
/// <summary>
/// Opaque token for Task Runner resume flow. Echoed in X-Resume-After header.
/// </summary>
[JsonPropertyName("resumeToken")]
public string? ResumeToken { get; init; }
/// <summary>
/// Human-readable summary for notifications.
/// </summary>
[JsonPropertyName("summary")]
public string? Summary { get; init; }
/// <summary>
/// Custom key-value metadata labels.
/// </summary>
[JsonPropertyName("labels")]
public Dictionary<string, string>? Labels { get; init; }
}
/// <summary>
/// Policy metadata associated with a pack approval.
/// </summary>
public sealed class PackApprovalPolicy
{
/// <summary>
/// Policy identifier.
/// </summary>
[JsonPropertyName("id")]
public string? Id { get; init; }
/// <summary>
/// Policy version.
/// </summary>
[JsonPropertyName("version")]
public string? Version { get; init; }
}

View File

@@ -0,0 +1,114 @@
using System.Text.Json.Serialization;
namespace StellaOps.Notifier.WebService.Contracts;
/// <summary>
/// Request to create or update a notification rule.
/// </summary>
public sealed record RuleCreateRequest
{
public required string RuleId { get; init; }
public required string Name { get; init; }
public string? Description { get; init; }
public bool Enabled { get; init; } = true;
public required RuleMatchRequest Match { get; init; }
public required List<RuleActionRequest> Actions { get; init; }
public Dictionary<string, string>? Labels { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
}
/// <summary>
/// Request to update an existing rule.
/// </summary>
public sealed record RuleUpdateRequest
{
public string? Name { get; init; }
public string? Description { get; init; }
public bool? Enabled { get; init; }
public RuleMatchRequest? Match { get; init; }
public List<RuleActionRequest>? Actions { get; init; }
public Dictionary<string, string>? Labels { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
}
/// <summary>
/// Rule match criteria.
/// </summary>
public sealed record RuleMatchRequest
{
public List<string>? EventKinds { get; init; }
public List<string>? Namespaces { get; init; }
public List<string>? Repositories { get; init; }
public List<string>? Digests { get; init; }
public List<string>? Labels { get; init; }
public List<string>? ComponentPurls { get; init; }
public string? MinSeverity { get; init; }
public List<string>? Verdicts { get; init; }
public bool? KevOnly { get; init; }
}
/// <summary>
/// Rule action configuration.
/// </summary>
public sealed record RuleActionRequest
{
public required string ActionId { get; init; }
public required string Channel { get; init; }
public string? Template { get; init; }
public string? Digest { get; init; }
public string? Throttle { get; init; } // ISO 8601 duration
public string? Locale { get; init; }
public bool Enabled { get; init; } = true;
public Dictionary<string, string>? Metadata { get; init; }
}
/// <summary>
/// Rule response DTO.
/// </summary>
public sealed record RuleResponse
{
public required string RuleId { get; init; }
public required string TenantId { get; init; }
public required string Name { get; init; }
public string? Description { get; init; }
public required bool Enabled { get; init; }
public required RuleMatchResponse Match { get; init; }
public required List<RuleActionResponse> Actions { get; init; }
public Dictionary<string, string>? Labels { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
public string? CreatedBy { get; init; }
public DateTimeOffset CreatedAt { get; init; }
public string? UpdatedBy { get; init; }
public DateTimeOffset UpdatedAt { get; init; }
}
/// <summary>
/// Rule match response.
/// </summary>
public sealed record RuleMatchResponse
{
public List<string> EventKinds { get; init; } = [];
public List<string> Namespaces { get; init; } = [];
public List<string> Repositories { get; init; } = [];
public List<string> Digests { get; init; } = [];
public List<string> Labels { get; init; } = [];
public List<string> ComponentPurls { get; init; } = [];
public string? MinSeverity { get; init; }
public List<string> Verdicts { get; init; } = [];
public bool KevOnly { get; init; }
}
/// <summary>
/// Rule action response.
/// </summary>
public sealed record RuleActionResponse
{
public required string ActionId { get; init; }
public required string Channel { get; init; }
public string? Template { get; init; }
public string? Digest { get; init; }
public string? Throttle { get; init; }
public string? Locale { get; init; }
public required bool Enabled { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
}

View File

@@ -0,0 +1,118 @@
using System.Text.Json.Nodes;
namespace StellaOps.Notifier.WebService.Contracts;
/// <summary>
/// Request to preview a template rendering.
/// </summary>
public sealed record TemplatePreviewRequest
{
/// <summary>
/// Template ID to preview (mutually exclusive with TemplateBody).
/// </summary>
public string? TemplateId { get; init; }
/// <summary>
/// Raw template body to preview (mutually exclusive with TemplateId).
/// </summary>
public string? TemplateBody { get; init; }
/// <summary>
/// Sample event payload for rendering.
/// </summary>
public JsonObject? SamplePayload { get; init; }
/// <summary>
/// Event kind for context.
/// </summary>
public string? EventKind { get; init; }
/// <summary>
/// Sample attributes.
/// </summary>
public Dictionary<string, string>? SampleAttributes { get; init; }
/// <summary>
/// Output format override.
/// </summary>
public string? OutputFormat { get; init; }
}
/// <summary>
/// Response from template preview.
/// </summary>
public sealed record TemplatePreviewResponse
{
/// <summary>
/// Rendered body content.
/// </summary>
public required string RenderedBody { get; init; }
/// <summary>
/// Rendered subject (if applicable).
/// </summary>
public string? RenderedSubject { get; init; }
/// <summary>
/// Content hash for deduplication.
/// </summary>
public required string BodyHash { get; init; }
/// <summary>
/// Output format used.
/// </summary>
public required string Format { get; init; }
/// <summary>
/// Validation warnings (if any).
/// </summary>
public List<string>? Warnings { get; init; }
}
/// <summary>
/// Request to create or update a template.
/// </summary>
public sealed record TemplateCreateRequest
{
public required string TemplateId { get; init; }
public required string Key { get; init; }
public required string ChannelType { get; init; }
public required string Locale { get; init; }
public required string Body { get; init; }
public string? RenderMode { get; init; }
public string? Format { get; init; }
public string? Description { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
}
/// <summary>
/// Template response DTO.
/// </summary>
public sealed record TemplateResponse
{
public required string TemplateId { get; init; }
public required string TenantId { get; init; }
public required string Key { get; init; }
public required string ChannelType { get; init; }
public required string Locale { get; init; }
public required string Body { get; init; }
public required string RenderMode { get; init; }
public required string Format { get; init; }
public string? Description { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
public string? CreatedBy { get; init; }
public DateTimeOffset CreatedAt { get; init; }
public string? UpdatedBy { get; init; }
public DateTimeOffset UpdatedAt { get; init; }
}
/// <summary>
/// Template list query parameters.
/// </summary>
public sealed record TemplateListQuery
{
public string? KeyPrefix { get; init; }
public string? ChannelType { get; init; }
public string? Locale { get; init; }
public int? Limit { get; init; }
}

View File

@@ -0,0 +1,796 @@
using Microsoft.AspNetCore.Mvc;
using StellaOps.Notifier.Worker.Escalation;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// API endpoints for escalation management.
/// </summary>
public static class EscalationEndpoints
{
/// <summary>
/// Maps escalation endpoints.
/// </summary>
public static IEndpointRouteBuilder MapEscalationEndpoints(this IEndpointRouteBuilder app)
{
// Escalation Policies
var policies = app.MapGroup("/api/v2/escalation-policies")
.WithTags("Escalation Policies")
.WithOpenApi();
policies.MapGet("/", ListPoliciesAsync)
.WithName("ListEscalationPolicies")
.WithSummary("List escalation policies");
policies.MapGet("/{policyId}", GetPolicyAsync)
.WithName("GetEscalationPolicy")
.WithSummary("Get an escalation policy");
policies.MapPost("/", CreatePolicyAsync)
.WithName("CreateEscalationPolicy")
.WithSummary("Create an escalation policy");
policies.MapPut("/{policyId}", UpdatePolicyAsync)
.WithName("UpdateEscalationPolicy")
.WithSummary("Update an escalation policy");
policies.MapDelete("/{policyId}", DeletePolicyAsync)
.WithName("DeleteEscalationPolicy")
.WithSummary("Delete an escalation policy");
// On-Call Schedules
var schedules = app.MapGroup("/api/v2/oncall-schedules")
.WithTags("On-Call Schedules")
.WithOpenApi();
schedules.MapGet("/", ListSchedulesAsync)
.WithName("ListOnCallSchedules")
.WithSummary("List on-call schedules");
schedules.MapGet("/{scheduleId}", GetScheduleAsync)
.WithName("GetOnCallSchedule")
.WithSummary("Get an on-call schedule");
schedules.MapPost("/", CreateScheduleAsync)
.WithName("CreateOnCallSchedule")
.WithSummary("Create an on-call schedule");
schedules.MapPut("/{scheduleId}", UpdateScheduleAsync)
.WithName("UpdateOnCallSchedule")
.WithSummary("Update an on-call schedule");
schedules.MapDelete("/{scheduleId}", DeleteScheduleAsync)
.WithName("DeleteOnCallSchedule")
.WithSummary("Delete an on-call schedule");
schedules.MapGet("/{scheduleId}/oncall", GetCurrentOnCallAsync)
.WithName("GetCurrentOnCall")
.WithSummary("Get current on-call users");
schedules.MapPost("/{scheduleId}/overrides", CreateOverrideAsync)
.WithName("CreateOnCallOverride")
.WithSummary("Create an on-call override");
schedules.MapDelete("/{scheduleId}/overrides/{overrideId}", DeleteOverrideAsync)
.WithName("DeleteOnCallOverride")
.WithSummary("Delete an on-call override");
// Active Escalations
var escalations = app.MapGroup("/api/v2/escalations")
.WithTags("Escalations")
.WithOpenApi();
escalations.MapGet("/", ListActiveEscalationsAsync)
.WithName("ListActiveEscalations")
.WithSummary("List active escalations");
escalations.MapGet("/{incidentId}", GetEscalationStateAsync)
.WithName("GetEscalationState")
.WithSummary("Get escalation state for an incident");
escalations.MapPost("/{incidentId}/start", StartEscalationAsync)
.WithName("StartEscalation")
.WithSummary("Start escalation for an incident");
escalations.MapPost("/{incidentId}/escalate", ManualEscalateAsync)
.WithName("ManualEscalate")
.WithSummary("Manually escalate to next level");
escalations.MapPost("/{incidentId}/stop", StopEscalationAsync)
.WithName("StopEscalation")
.WithSummary("Stop escalation");
// Ack Bridge
var ack = app.MapGroup("/api/v2/ack")
.WithTags("Acknowledgment")
.WithOpenApi();
ack.MapPost("/", ProcessAckAsync)
.WithName("ProcessAck")
.WithSummary("Process an acknowledgment");
ack.MapGet("/", ProcessAckLinkAsync)
.WithName("ProcessAckLink")
.WithSummary("Process an acknowledgment link");
ack.MapPost("/webhook/pagerduty", ProcessPagerDutyWebhookAsync)
.WithName("PagerDutyWebhook")
.WithSummary("Process PagerDuty webhook");
ack.MapPost("/webhook/opsgenie", ProcessOpsGenieWebhookAsync)
.WithName("OpsGenieWebhook")
.WithSummary("Process OpsGenie webhook");
return app;
}
#region Policy Endpoints
private static async Task<IResult> ListPoliciesAsync(
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IEscalationPolicyService policyService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var policies = await policyService.ListPoliciesAsync(tenantId, cancellationToken);
return Results.Ok(policies);
}
private static async Task<IResult> GetPolicyAsync(
string policyId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IEscalationPolicyService policyService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var policy = await policyService.GetPolicyAsync(tenantId, policyId, cancellationToken);
return policy is null
? Results.NotFound(new { error = $"Policy '{policyId}' not found." })
: Results.Ok(policy);
}
private static async Task<IResult> CreatePolicyAsync(
[FromBody] EscalationPolicyApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantIdHeader,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IEscalationPolicyService policyService,
CancellationToken cancellationToken)
{
var tenantId = request.TenantId ?? tenantIdHeader;
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "Tenant ID is required." });
}
if (string.IsNullOrWhiteSpace(request.Name))
{
return Results.BadRequest(new { error = "Policy name is required." });
}
if (request.Levels is null || request.Levels.Count == 0)
{
return Results.BadRequest(new { error = "At least one escalation level is required." });
}
var policy = MapToPolicy(request, tenantId);
var created = await policyService.UpsertPolicyAsync(policy, actor, cancellationToken);
return Results.Created($"/api/v2/escalation-policies/{created.PolicyId}", created);
}
private static async Task<IResult> UpdatePolicyAsync(
string policyId,
[FromBody] EscalationPolicyApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantIdHeader,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IEscalationPolicyService policyService,
CancellationToken cancellationToken)
{
var tenantId = request.TenantId ?? tenantIdHeader;
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "Tenant ID is required." });
}
var existing = await policyService.GetPolicyAsync(tenantId, policyId, cancellationToken);
if (existing is null)
{
return Results.NotFound(new { error = $"Policy '{policyId}' not found." });
}
var policy = MapToPolicy(request, tenantId) with
{
PolicyId = policyId,
CreatedAt = existing.CreatedAt,
CreatedBy = existing.CreatedBy
};
var updated = await policyService.UpsertPolicyAsync(policy, actor, cancellationToken);
return Results.Ok(updated);
}
private static async Task<IResult> DeletePolicyAsync(
string policyId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IEscalationPolicyService policyService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var deleted = await policyService.DeletePolicyAsync(tenantId, policyId, actor, cancellationToken);
return deleted ? Results.NoContent() : Results.NotFound(new { error = $"Policy '{policyId}' not found." });
}
#endregion
#region Schedule Endpoints
private static async Task<IResult> ListSchedulesAsync(
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var schedules = await scheduleService.ListSchedulesAsync(tenantId, cancellationToken);
return Results.Ok(schedules);
}
private static async Task<IResult> GetScheduleAsync(
string scheduleId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var schedule = await scheduleService.GetScheduleAsync(tenantId, scheduleId, cancellationToken);
return schedule is null
? Results.NotFound(new { error = $"Schedule '{scheduleId}' not found." })
: Results.Ok(schedule);
}
private static async Task<IResult> CreateScheduleAsync(
[FromBody] OnCallScheduleApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantIdHeader,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
var tenantId = request.TenantId ?? tenantIdHeader;
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "Tenant ID is required." });
}
if (string.IsNullOrWhiteSpace(request.Name))
{
return Results.BadRequest(new { error = "Schedule name is required." });
}
var schedule = MapToSchedule(request, tenantId);
var created = await scheduleService.UpsertScheduleAsync(schedule, actor, cancellationToken);
return Results.Created($"/api/v2/oncall-schedules/{created.ScheduleId}", created);
}
private static async Task<IResult> UpdateScheduleAsync(
string scheduleId,
[FromBody] OnCallScheduleApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantIdHeader,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
var tenantId = request.TenantId ?? tenantIdHeader;
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "Tenant ID is required." });
}
var existing = await scheduleService.GetScheduleAsync(tenantId, scheduleId, cancellationToken);
if (existing is null)
{
return Results.NotFound(new { error = $"Schedule '{scheduleId}' not found." });
}
var schedule = MapToSchedule(request, tenantId) with
{
ScheduleId = scheduleId,
CreatedAt = existing.CreatedAt,
CreatedBy = existing.CreatedBy
};
var updated = await scheduleService.UpsertScheduleAsync(schedule, actor, cancellationToken);
return Results.Ok(updated);
}
private static async Task<IResult> DeleteScheduleAsync(
string scheduleId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var deleted = await scheduleService.DeleteScheduleAsync(tenantId, scheduleId, actor, cancellationToken);
return deleted ? Results.NoContent() : Results.NotFound(new { error = $"Schedule '{scheduleId}' not found." });
}
private static async Task<IResult> GetCurrentOnCallAsync(
string scheduleId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromQuery] DateTimeOffset? atTime,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var users = await scheduleService.GetCurrentOnCallAsync(tenantId, scheduleId, atTime, cancellationToken);
return Results.Ok(users);
}
private static async Task<IResult> CreateOverrideAsync(
string scheduleId,
[FromBody] OnCallOverrideApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var @override = new OnCallOverride
{
OverrideId = request.OverrideId ?? Guid.NewGuid().ToString("N")[..16],
User = new OnCallUser
{
UserId = request.UserId ?? "",
Name = request.UserName ?? request.UserId ?? ""
},
StartsAt = request.StartsAt ?? DateTimeOffset.UtcNow,
EndsAt = request.EndsAt ?? DateTimeOffset.UtcNow.AddHours(8),
Reason = request.Reason
};
try
{
var created = await scheduleService.CreateOverrideAsync(tenantId, scheduleId, @override, actor, cancellationToken);
return Results.Created($"/api/v2/oncall-schedules/{scheduleId}/overrides/{created.OverrideId}", created);
}
catch (InvalidOperationException ex)
{
return Results.NotFound(new { error = ex.Message });
}
}
private static async Task<IResult> DeleteOverrideAsync(
string scheduleId,
string overrideId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IOnCallScheduleService scheduleService,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var deleted = await scheduleService.DeleteOverrideAsync(tenantId, scheduleId, overrideId, actor, cancellationToken);
return deleted ? Results.NoContent() : Results.NotFound(new { error = "Override not found." });
}
#endregion
#region Escalation Endpoints
private static async Task<IResult> ListActiveEscalationsAsync(
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IEscalationEngine escalationEngine,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var escalations = await escalationEngine.ListActiveEscalationsAsync(tenantId, cancellationToken);
return Results.Ok(escalations);
}
private static async Task<IResult> GetEscalationStateAsync(
string incidentId,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IEscalationEngine escalationEngine,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var state = await escalationEngine.GetEscalationStateAsync(tenantId, incidentId, cancellationToken);
return state is null
? Results.NotFound(new { error = $"No escalation found for incident '{incidentId}'." })
: Results.Ok(state);
}
private static async Task<IResult> StartEscalationAsync(
string incidentId,
[FromBody] StartEscalationApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromServices] IEscalationEngine escalationEngine,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
if (string.IsNullOrWhiteSpace(request.PolicyId))
{
return Results.BadRequest(new { error = "Policy ID is required." });
}
try
{
var state = await escalationEngine.StartEscalationAsync(tenantId, incidentId, request.PolicyId, cancellationToken);
return Results.Created($"/api/v2/escalations/{incidentId}", state);
}
catch (InvalidOperationException ex)
{
return Results.BadRequest(new { error = ex.Message });
}
}
private static async Task<IResult> ManualEscalateAsync(
string incidentId,
[FromBody] ManualEscalateApiRequest? request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IEscalationEngine escalationEngine,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var state = await escalationEngine.EscalateAsync(tenantId, incidentId, request?.Reason, actor, cancellationToken);
return state is null
? Results.NotFound(new { error = $"No active escalation found for incident '{incidentId}'." })
: Results.Ok(state);
}
private static async Task<IResult> StopEscalationAsync(
string incidentId,
[FromBody] StopEscalationApiRequest request,
[FromHeader(Name = "X-Tenant-Id")] string? tenantId,
[FromHeader(Name = "X-Actor")] string? actor,
[FromServices] IEscalationEngine escalationEngine,
CancellationToken cancellationToken)
{
if (string.IsNullOrWhiteSpace(tenantId))
{
return Results.BadRequest(new { error = "X-Tenant-Id header is required." });
}
var stopped = await escalationEngine.StopEscalationAsync(
tenantId, incidentId, request.Reason ?? "Manually stopped", actor, cancellationToken);
return stopped
? Results.NoContent()
: Results.NotFound(new { error = $"No active escalation found for incident '{incidentId}'." });
}
#endregion
#region Ack Endpoints
private static async Task<IResult> ProcessAckAsync(
[FromBody] AckApiRequest request,
[FromServices] IAckBridge ackBridge,
CancellationToken cancellationToken)
{
var bridgeRequest = new AckBridgeRequest
{
Source = AckSource.Api,
TenantId = request.TenantId,
IncidentId = request.IncidentId,
AcknowledgedBy = request.AcknowledgedBy ?? "api",
Comment = request.Comment
};
var result = await ackBridge.ProcessAckAsync(bridgeRequest, cancellationToken);
return result.Success
? Results.Ok(result)
: Results.BadRequest(new { error = result.Error });
}
private static async Task<IResult> ProcessAckLinkAsync(
[FromQuery] string token,
[FromServices] IAckBridge ackBridge,
CancellationToken cancellationToken)
{
var validation = await ackBridge.ValidateTokenAsync(token, cancellationToken);
if (!validation.IsValid)
{
return Results.BadRequest(new { error = validation.Error });
}
var bridgeRequest = new AckBridgeRequest
{
Source = AckSource.SignedLink,
Token = token,
AcknowledgedBy = validation.TargetId ?? "link"
};
var result = await ackBridge.ProcessAckAsync(bridgeRequest, cancellationToken);
return result.Success
? Results.Ok(new { message = "Acknowledged successfully", incidentId = result.IncidentId })
: Results.BadRequest(new { error = result.Error });
}
private static async Task<IResult> ProcessPagerDutyWebhookAsync(
HttpContext context,
[FromServices] IEnumerable<IExternalIntegrationAdapter> adapters,
[FromServices] IAckBridge ackBridge,
CancellationToken cancellationToken)
{
var pagerDutyAdapter = adapters.OfType<PagerDutyAdapter>().FirstOrDefault();
if (pagerDutyAdapter is null)
{
return Results.BadRequest(new { error = "PagerDuty integration not configured." });
}
using var reader = new StreamReader(context.Request.Body);
var payload = await reader.ReadToEndAsync(cancellationToken);
var request = pagerDutyAdapter.ParseWebhook(payload);
if (request is null)
{
return Results.Ok(new { message = "Webhook received but no action taken." });
}
var result = await ackBridge.ProcessAckAsync(request, cancellationToken);
return Results.Ok(new { processed = result.Success });
}
private static async Task<IResult> ProcessOpsGenieWebhookAsync(
HttpContext context,
[FromServices] IEnumerable<IExternalIntegrationAdapter> adapters,
[FromServices] IAckBridge ackBridge,
CancellationToken cancellationToken)
{
var opsGenieAdapter = adapters.OfType<OpsGenieAdapter>().FirstOrDefault();
if (opsGenieAdapter is null)
{
return Results.BadRequest(new { error = "OpsGenie integration not configured." });
}
using var reader = new StreamReader(context.Request.Body);
var payload = await reader.ReadToEndAsync(cancellationToken);
var request = opsGenieAdapter.ParseWebhook(payload);
if (request is null)
{
return Results.Ok(new { message = "Webhook received but no action taken." });
}
var result = await ackBridge.ProcessAckAsync(request, cancellationToken);
return Results.Ok(new { processed = result.Success });
}
#endregion
#region Mapping
private static EscalationPolicy MapToPolicy(EscalationPolicyApiRequest request, string tenantId) => new()
{
PolicyId = request.PolicyId ?? Guid.NewGuid().ToString("N")[..16],
TenantId = tenantId,
Name = request.Name!,
Description = request.Description,
IsDefault = request.IsDefault ?? false,
Enabled = request.Enabled ?? true,
EventKinds = request.EventKinds,
MinSeverity = request.MinSeverity,
Levels = request.Levels!.Select((l, i) => new EscalationLevel
{
Level = l.Level ?? i + 1,
Name = l.Name,
EscalateAfter = TimeSpan.FromMinutes(l.EscalateAfterMinutes ?? 15),
Targets = l.Targets?.Select(t => new EscalationTarget
{
Type = Enum.TryParse<EscalationTargetType>(t.Type, true, out var type) ? type : EscalationTargetType.User,
TargetId = t.TargetId ?? "",
Name = t.Name,
ChannelId = t.ChannelId
}).ToList() ?? [],
NotifyMode = Enum.TryParse<EscalationNotifyMode>(l.NotifyMode, true, out var mode) ? mode : EscalationNotifyMode.All,
StopOnAck = l.StopOnAck ?? true
}).ToList(),
ExhaustedAction = Enum.TryParse<EscalationExhaustedAction>(request.ExhaustedAction, true, out var action)
? action : EscalationExhaustedAction.RepeatLastLevel,
MaxCycles = request.MaxCycles ?? 3
};
private static OnCallSchedule MapToSchedule(OnCallScheduleApiRequest request, string tenantId) => new()
{
ScheduleId = request.ScheduleId ?? Guid.NewGuid().ToString("N")[..16],
TenantId = tenantId,
Name = request.Name!,
Description = request.Description,
Timezone = request.Timezone ?? "UTC",
Enabled = request.Enabled ?? true,
Layers = request.Layers?.Select(l => new RotationLayer
{
Name = l.Name ?? "Default",
Priority = l.Priority ?? 100,
Users = l.Users?.Select((u, i) => new OnCallUser
{
UserId = u.UserId ?? "",
Name = u.Name ?? u.UserId ?? "",
Email = u.Email,
Phone = u.Phone,
PreferredChannelId = u.PreferredChannelId,
Order = u.Order ?? i
}).ToList() ?? [],
Type = Enum.TryParse<RotationType>(l.RotationType, true, out var type) ? type : RotationType.Weekly,
HandoffTime = TimeOnly.TryParse(l.HandoffTime, out var time) ? time : new TimeOnly(9, 0),
RotationInterval = TimeSpan.FromDays(l.RotationIntervalDays ?? 7),
RotationStart = l.RotationStart ?? DateTimeOffset.UtcNow,
Restrictions = l.Restrictions?.Select(r => new ScheduleRestriction
{
Type = Enum.TryParse<RestrictionType>(r.Type, true, out var rType) ? rType : RestrictionType.DaysOfWeek,
DaysOfWeek = r.DaysOfWeek,
StartTime = TimeOnly.TryParse(r.StartTime, out var start) ? start : null,
EndTime = TimeOnly.TryParse(r.EndTime, out var end) ? end : null
}).ToList(),
Enabled = l.Enabled ?? true
}).ToList() ?? []
};
#endregion
}
#region API Request Models
public sealed class EscalationPolicyApiRequest
{
public string? PolicyId { get; set; }
public string? TenantId { get; set; }
public string? Name { get; set; }
public string? Description { get; set; }
public bool? IsDefault { get; set; }
public bool? Enabled { get; set; }
public List<string>? EventKinds { get; set; }
public string? MinSeverity { get; set; }
public List<EscalationLevelApiRequest>? Levels { get; set; }
public string? ExhaustedAction { get; set; }
public int? MaxCycles { get; set; }
}
public sealed class EscalationLevelApiRequest
{
public int? Level { get; set; }
public string? Name { get; set; }
public int? EscalateAfterMinutes { get; set; }
public List<EscalationTargetApiRequest>? Targets { get; set; }
public string? NotifyMode { get; set; }
public bool? StopOnAck { get; set; }
}
public sealed class EscalationTargetApiRequest
{
public string? Type { get; set; }
public string? TargetId { get; set; }
public string? Name { get; set; }
public string? ChannelId { get; set; }
}
public sealed class OnCallScheduleApiRequest
{
public string? ScheduleId { get; set; }
public string? TenantId { get; set; }
public string? Name { get; set; }
public string? Description { get; set; }
public string? Timezone { get; set; }
public bool? Enabled { get; set; }
public List<RotationLayerApiRequest>? Layers { get; set; }
}
public sealed class RotationLayerApiRequest
{
public string? Name { get; set; }
public int? Priority { get; set; }
public List<OnCallUserApiRequest>? Users { get; set; }
public string? RotationType { get; set; }
public string? HandoffTime { get; set; }
public int? RotationIntervalDays { get; set; }
public DateTimeOffset? RotationStart { get; set; }
public List<ScheduleRestrictionApiRequest>? Restrictions { get; set; }
public bool? Enabled { get; set; }
}
public sealed class OnCallUserApiRequest
{
public string? UserId { get; set; }
public string? Name { get; set; }
public string? Email { get; set; }
public string? Phone { get; set; }
public string? PreferredChannelId { get; set; }
public int? Order { get; set; }
}
public sealed class ScheduleRestrictionApiRequest
{
public string? Type { get; set; }
public List<int>? DaysOfWeek { get; set; }
public string? StartTime { get; set; }
public string? EndTime { get; set; }
}
public sealed class OnCallOverrideApiRequest
{
public string? OverrideId { get; set; }
public string? UserId { get; set; }
public string? UserName { get; set; }
public DateTimeOffset? StartsAt { get; set; }
public DateTimeOffset? EndsAt { get; set; }
public string? Reason { get; set; }
}
public sealed class StartEscalationApiRequest
{
public string? PolicyId { get; set; }
}
public sealed class ManualEscalateApiRequest
{
public string? Reason { get; set; }
}
public sealed class StopEscalationApiRequest
{
public string? Reason { get; set; }
}
public sealed class AckApiRequest
{
public string? TenantId { get; set; }
public string? IncidentId { get; set; }
public string? AcknowledgedBy { get; set; }
public string? Comment { get; set; }
}
#endregion

View File

@@ -0,0 +1,193 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Routing;
using StellaOps.Notify.Models;
using StellaOps.Notifier.Worker.Fallback;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// REST API endpoints for fallback handler operations.
/// </summary>
public static class FallbackEndpoints
{
/// <summary>
/// Maps fallback API endpoints.
/// </summary>
public static RouteGroupBuilder MapFallbackEndpoints(this IEndpointRouteBuilder endpoints)
{
var group = endpoints.MapGroup("/api/v2/fallback")
.WithTags("Fallback")
.WithOpenApi();
// Get fallback statistics
group.MapGet("/statistics", async (
int? windowHours,
HttpContext context,
IFallbackHandler fallbackHandler,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var window = windowHours.HasValue ? TimeSpan.FromHours(windowHours.Value) : (TimeSpan?)null;
var stats = await fallbackHandler.GetStatisticsAsync(tenantId, window, cancellationToken);
return Results.Ok(new
{
stats.TenantId,
window = stats.Window.ToString(),
stats.TotalDeliveries,
stats.PrimarySuccesses,
stats.FallbackAttempts,
stats.FallbackSuccesses,
stats.ExhaustedDeliveries,
successRate = $"{stats.SuccessRate:P1}",
fallbackUtilizationRate = $"{stats.FallbackUtilizationRate:P1}",
failuresByChannel = stats.FailuresByChannel.ToDictionary(
kvp => kvp.Key.ToString(),
kvp => kvp.Value)
});
})
.WithName("GetFallbackStatistics")
.WithSummary("Gets fallback handling statistics for a tenant");
// Get fallback chain for a channel
group.MapGet("/chains/{channelType}", async (
NotifyChannelType channelType,
HttpContext context,
IFallbackHandler fallbackHandler,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var chain = await fallbackHandler.GetFallbackChainAsync(tenantId, channelType, cancellationToken);
return Results.Ok(new
{
tenantId,
primaryChannel = channelType.ToString(),
fallbackChain = chain.Select(c => c.ToString()).ToList(),
chainLength = chain.Count
});
})
.WithName("GetFallbackChain")
.WithSummary("Gets the fallback chain for a channel type");
// Set fallback chain for a channel
group.MapPut("/chains/{channelType}", async (
NotifyChannelType channelType,
SetFallbackChainRequest request,
HttpContext context,
IFallbackHandler fallbackHandler,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var actor = context.Request.Headers["X-Actor"].FirstOrDefault() ?? "system";
var chain = request.FallbackChain
.Select(s => Enum.TryParse<NotifyChannelType>(s, out var t) ? t : (NotifyChannelType?)null)
.Where(t => t.HasValue)
.Select(t => t!.Value)
.ToList();
await fallbackHandler.SetFallbackChainAsync(tenantId, channelType, chain, actor, cancellationToken);
return Results.Ok(new
{
message = "Fallback chain updated successfully",
primaryChannel = channelType.ToString(),
fallbackChain = chain.Select(c => c.ToString()).ToList()
});
})
.WithName("SetFallbackChain")
.WithSummary("Sets a custom fallback chain for a channel type");
// Test fallback resolution
group.MapPost("/test", async (
TestFallbackRequest request,
HttpContext context,
IFallbackHandler fallbackHandler,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
if (!Enum.TryParse<NotifyChannelType>(request.FailedChannelType, out var channelType))
{
return Results.BadRequest(new { error = $"Invalid channel type: {request.FailedChannelType}" });
}
var deliveryId = $"test-{Guid.NewGuid():N}"[..20];
// Simulate failure recording
await fallbackHandler.RecordFailureAsync(
tenantId, deliveryId, channelType, "Test failure", cancellationToken);
// Get fallback result
var result = await fallbackHandler.GetFallbackAsync(
tenantId, channelType, deliveryId, cancellationToken);
// Clean up test state
await fallbackHandler.ClearDeliveryStateAsync(tenantId, deliveryId, cancellationToken);
return Results.Ok(new
{
testDeliveryId = deliveryId,
result.HasFallback,
nextChannelType = result.NextChannelType?.ToString(),
result.AttemptNumber,
result.TotalChannels,
result.IsExhausted,
result.ExhaustionReason,
failedChannels = result.FailedChannels.Select(f => new
{
channelType = f.ChannelType.ToString(),
f.Reason,
f.FailedAt,
f.AttemptNumber
}).ToList()
});
})
.WithName("TestFallback")
.WithSummary("Tests fallback resolution without affecting real deliveries");
// Clear delivery state
group.MapDelete("/deliveries/{deliveryId}", async (
string deliveryId,
HttpContext context,
IFallbackHandler fallbackHandler,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
await fallbackHandler.ClearDeliveryStateAsync(tenantId, deliveryId, cancellationToken);
return Results.Ok(new { message = $"Delivery state for '{deliveryId}' cleared" });
})
.WithName("ClearDeliveryFallbackState")
.WithSummary("Clears fallback state for a specific delivery");
return group;
}
}
/// <summary>
/// Request to set a custom fallback chain.
/// </summary>
public sealed record SetFallbackChainRequest
{
/// <summary>
/// Ordered list of fallback channel types.
/// </summary>
public required List<string> FallbackChain { get; init; }
}
/// <summary>
/// Request to test fallback resolution.
/// </summary>
public sealed record TestFallbackRequest
{
/// <summary>
/// The channel type that "failed".
/// </summary>
public required string FailedChannelType { get; init; }
}

View File

@@ -0,0 +1,311 @@
using System.Text.Json;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Routing;
using StellaOps.Notify.Models;
using StellaOps.Notify.Storage.Mongo.Documents;
using StellaOps.Notify.Storage.Mongo.Repositories;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// Maps incident (delivery) management endpoints.
/// </summary>
public static class IncidentEndpoints
{
public static IEndpointRouteBuilder MapIncidentEndpoints(this IEndpointRouteBuilder app)
{
var group = app.MapGroup("/api/v2/incidents")
.WithTags("Incidents");
group.MapGet("/", ListIncidentsAsync)
.WithName("ListIncidents")
.WithSummary("Lists notification incidents (deliveries)");
group.MapGet("/{deliveryId}", GetIncidentAsync)
.WithName("GetIncident")
.WithSummary("Gets an incident by delivery ID");
group.MapPost("/{deliveryId}/ack", AcknowledgeIncidentAsync)
.WithName("AcknowledgeIncident")
.WithSummary("Acknowledges an incident");
group.MapGet("/stats", GetIncidentStatsAsync)
.WithName("GetIncidentStats")
.WithSummary("Gets incident statistics");
return app;
}
private static async Task<IResult> ListIncidentsAsync(
HttpContext context,
INotifyDeliveryRepository deliveries,
string? status = null,
string? kind = null,
string? ruleId = null,
int? limit = null,
string? continuationToken = null,
DateTimeOffset? since = null)
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
// Query deliveries with filtering
var queryResult = await deliveries.QueryAsync(
tenantId,
since,
status,
limit ?? 50,
continuationToken,
context.RequestAborted);
IEnumerable<NotifyDelivery> filtered = queryResult.Items;
// Apply additional filters not supported by the repository
if (!string.IsNullOrWhiteSpace(kind))
{
filtered = filtered.Where(d => d.Kind.Equals(kind, StringComparison.OrdinalIgnoreCase));
}
if (!string.IsNullOrWhiteSpace(ruleId))
{
filtered = filtered.Where(d => d.RuleId.Equals(ruleId, StringComparison.OrdinalIgnoreCase));
}
var response = filtered.Select(MapToDeliveryResponse).ToList();
// Add continuation token header for pagination
if (!string.IsNullOrWhiteSpace(queryResult.ContinuationToken))
{
context.Response.Headers["X-Continuation-Token"] = queryResult.ContinuationToken;
}
return Results.Ok(response);
}
private static async Task<IResult> GetIncidentAsync(
HttpContext context,
string deliveryId,
INotifyDeliveryRepository deliveries)
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var delivery = await deliveries.GetAsync(tenantId, deliveryId, context.RequestAborted);
if (delivery is null)
{
return Results.NotFound(Error("incident_not_found", $"Incident '{deliveryId}' not found.", context));
}
return Results.Ok(MapToDeliveryResponse(delivery));
}
private static async Task<IResult> AcknowledgeIncidentAsync(
HttpContext context,
string deliveryId,
DeliveryAckRequest request,
INotifyDeliveryRepository deliveries,
INotifyAuditRepository audit,
TimeProvider timeProvider)
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var actor = GetActor(context);
var delivery = await deliveries.GetAsync(tenantId, deliveryId, context.RequestAborted);
if (delivery is null)
{
return Results.NotFound(Error("incident_not_found", $"Incident '{deliveryId}' not found.", context));
}
// Update delivery status based on acknowledgment
var newStatus = request.Resolution?.ToLowerInvariant() switch
{
"resolved" => NotifyDeliveryStatus.Delivered,
"dismissed" => NotifyDeliveryStatus.Failed,
_ => delivery.Status
};
var attempt = new NotifyDeliveryAttempt(
timestamp: timeProvider.GetUtcNow(),
status: NotifyDeliveryAttemptStatus.Success,
reason: $"Acknowledged by {actor}: {request.Comment ?? request.Resolution ?? "ack"}");
var updated = delivery with
{
Status = newStatus,
StatusReason = request.Comment ?? $"Acknowledged: {request.Resolution}",
CompletedAt = timeProvider.GetUtcNow(),
Attempts = delivery.Attempts.Add(attempt)
};
await deliveries.UpdateAsync(updated, context.RequestAborted);
await AppendAuditAsync(audit, tenantId, actor, "incident.acknowledged", deliveryId, "incident", new
{
deliveryId,
request.Resolution,
request.Comment
}, timeProvider, context.RequestAborted);
return Results.Ok(MapToResponse(updated));
}
private static async Task<IResult> GetIncidentStatsAsync(
HttpContext context,
INotifyDeliveryRepository deliveries)
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var allDeliveries = await deliveries.ListAsync(tenantId, context.RequestAborted);
var stats = new DeliveryStatsResponse
{
Total = allDeliveries.Count,
Pending = allDeliveries.Count(d => d.Status == NotifyDeliveryStatus.Pending),
Delivered = allDeliveries.Count(d => d.Status == NotifyDeliveryStatus.Delivered),
Failed = allDeliveries.Count(d => d.Status == NotifyDeliveryStatus.Failed),
ByKind = allDeliveries
.GroupBy(d => d.Kind)
.ToDictionary(g => g.Key, g => g.Count()),
ByRule = allDeliveries
.GroupBy(d => d.RuleId)
.ToDictionary(g => g.Key, g => g.Count())
};
return Results.Ok(stats);
}
private static DeliveryResponse MapToDeliveryResponse(NotifyDelivery delivery)
{
return new DeliveryResponse
{
DeliveryId = delivery.DeliveryId,
TenantId = delivery.TenantId,
RuleId = delivery.RuleId,
ActionId = delivery.ActionId,
EventId = delivery.EventId.ToString(),
Kind = delivery.Kind,
Status = delivery.Status.ToString(),
StatusReason = delivery.StatusReason,
AttemptCount = delivery.Attempts.Length,
LastAttempt = delivery.Attempts.Length > 0 ? delivery.Attempts[^1].Timestamp : null,
CreatedAt = delivery.CreatedAt,
SentAt = delivery.SentAt,
CompletedAt = delivery.CompletedAt,
Metadata = delivery.Metadata.ToDictionary(kvp => kvp.Key, kvp => kvp.Value)
};
}
private static string? GetTenantId(HttpContext context)
{
var tenantId = context.Request.Headers["X-StellaOps-Tenant"].ToString();
return string.IsNullOrWhiteSpace(tenantId) ? null : tenantId;
}
private static string GetActor(HttpContext context)
{
var actor = context.Request.Headers["X-StellaOps-Actor"].ToString();
return string.IsNullOrWhiteSpace(actor) ? "api" : actor;
}
private static async Task AppendAuditAsync(
INotifyAuditRepository audit,
string tenantId,
string actor,
string action,
string entityId,
string entityType,
object payload,
TimeProvider timeProvider,
CancellationToken cancellationToken)
{
try
{
var entry = new NotifyAuditEntryDocument
{
TenantId = tenantId,
Actor = actor,
Action = action,
EntityId = entityId,
EntityType = entityType,
Timestamp = timeProvider.GetUtcNow(),
Payload = MongoDB.Bson.Serialization.BsonSerializer.Deserialize<MongoDB.Bson.BsonDocument>(
JsonSerializer.Serialize(payload))
};
await audit.AppendAsync(entry, cancellationToken);
}
catch
{
// Ignore audit failures
}
}
private static object Error(string code, string message, HttpContext context) => new
{
error = new
{
code,
message,
traceId = context.TraceIdentifier
}
};
}
/// <summary>
/// Delivery acknowledgment request for v2 API.
/// </summary>
public sealed record DeliveryAckRequest
{
public string? Resolution { get; init; }
public string? Comment { get; init; }
}
/// <summary>
/// Delivery response DTO for v2 API.
/// </summary>
public sealed record DeliveryResponse
{
public required string DeliveryId { get; init; }
public required string TenantId { get; init; }
public required string RuleId { get; init; }
public required string ActionId { get; init; }
public required string EventId { get; init; }
public required string Kind { get; init; }
public required string Status { get; init; }
public string? StatusReason { get; init; }
public required int AttemptCount { get; init; }
public DateTimeOffset? LastAttempt { get; init; }
public DateTimeOffset CreatedAt { get; init; }
public DateTimeOffset? SentAt { get; init; }
public DateTimeOffset? CompletedAt { get; init; }
public Dictionary<string, string>? Metadata { get; init; }
}
/// <summary>
/// Delivery statistics response for v2 API.
/// </summary>
public sealed record DeliveryStatsResponse
{
public required int Total { get; init; }
public required int Pending { get; init; }
public required int Delivered { get; init; }
public required int Failed { get; init; }
public required Dictionary<string, int> ByKind { get; init; }
public required Dictionary<string, int> ByRule { get; init; }
}

View File

@@ -0,0 +1,316 @@
using System.Collections.Concurrent;
using System.Net.WebSockets;
using System.Text;
using System.Text.Json;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Routing;
using StellaOps.Notify.Models;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// WebSocket live feed for real-time incident updates.
/// </summary>
public static class IncidentLiveFeed
{
private static readonly ConcurrentDictionary<string, ConcurrentBag<WebSocket>> _tenantSubscriptions = new();
public static IEndpointRouteBuilder MapIncidentLiveFeed(this IEndpointRouteBuilder app)
{
app.Map("/api/v2/incidents/live", HandleWebSocketAsync);
return app;
}
private static async Task HandleWebSocketAsync(HttpContext context)
{
if (!context.WebSockets.IsWebSocketRequest)
{
context.Response.StatusCode = StatusCodes.Status400BadRequest;
await context.Response.WriteAsJsonAsync(new
{
error = new
{
code = "websocket_required",
message = "This endpoint requires a WebSocket connection.",
traceId = context.TraceIdentifier
}
});
return;
}
var tenantId = context.Request.Headers["X-StellaOps-Tenant"].ToString();
if (string.IsNullOrWhiteSpace(tenantId))
{
// Try query string fallback for WebSocket clients that can't set headers
tenantId = context.Request.Query["tenant"].ToString();
}
if (string.IsNullOrWhiteSpace(tenantId))
{
context.Response.StatusCode = StatusCodes.Status400BadRequest;
await context.Response.WriteAsJsonAsync(new
{
error = new
{
code = "tenant_missing",
message = "X-StellaOps-Tenant header or 'tenant' query parameter is required.",
traceId = context.TraceIdentifier
}
});
return;
}
using var webSocket = await context.WebSockets.AcceptWebSocketAsync();
var subscriptions = _tenantSubscriptions.GetOrAdd(tenantId, _ => new ConcurrentBag<WebSocket>());
subscriptions.Add(webSocket);
try
{
// Send connection acknowledgment
var ackMessage = JsonSerializer.Serialize(new
{
type = "connected",
tenantId,
timestamp = DateTimeOffset.UtcNow
});
await SendMessageAsync(webSocket, ackMessage, context.RequestAborted);
// Keep connection alive and handle incoming messages
await ReceiveMessagesAsync(webSocket, tenantId, context.RequestAborted);
}
finally
{
// Remove from subscriptions
var newBag = new ConcurrentBag<WebSocket>(
subscriptions.Where(s => s != webSocket && s.State == WebSocketState.Open));
_tenantSubscriptions.TryUpdate(tenantId, newBag, subscriptions);
}
}
private static async Task ReceiveMessagesAsync(WebSocket webSocket, string tenantId, CancellationToken cancellationToken)
{
var buffer = new byte[4096];
while (webSocket.State == WebSocketState.Open && !cancellationToken.IsCancellationRequested)
{
try
{
var result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), cancellationToken);
if (result.MessageType == WebSocketMessageType.Close)
{
await webSocket.CloseAsync(
WebSocketCloseStatus.NormalClosure,
"Client initiated close",
cancellationToken);
break;
}
if (result.MessageType == WebSocketMessageType.Text)
{
var message = Encoding.UTF8.GetString(buffer, 0, result.Count);
await HandleClientMessageAsync(webSocket, tenantId, message, cancellationToken);
}
}
catch (WebSocketException)
{
break;
}
catch (OperationCanceledException)
{
break;
}
}
}
private static async Task HandleClientMessageAsync(WebSocket webSocket, string tenantId, string message, CancellationToken cancellationToken)
{
try
{
using var doc = JsonDocument.Parse(message);
var root = doc.RootElement;
if (root.TryGetProperty("type", out var typeElement))
{
var type = typeElement.GetString();
switch (type)
{
case "ping":
var pongResponse = JsonSerializer.Serialize(new
{
type = "pong",
timestamp = DateTimeOffset.UtcNow
});
await SendMessageAsync(webSocket, pongResponse, cancellationToken);
break;
case "subscribe":
// Handle filter subscriptions (e.g., specific rule IDs, kinds)
var subResponse = JsonSerializer.Serialize(new
{
type = "subscribed",
tenantId,
timestamp = DateTimeOffset.UtcNow
});
await SendMessageAsync(webSocket, subResponse, cancellationToken);
break;
default:
var errorResponse = JsonSerializer.Serialize(new
{
type = "error",
message = $"Unknown message type: {type}"
});
await SendMessageAsync(webSocket, errorResponse, cancellationToken);
break;
}
}
}
catch (JsonException)
{
var errorResponse = JsonSerializer.Serialize(new
{
type = "error",
message = "Invalid JSON message"
});
await SendMessageAsync(webSocket, errorResponse, cancellationToken);
}
}
private static async Task SendMessageAsync(WebSocket webSocket, string message, CancellationToken cancellationToken)
{
if (webSocket.State != WebSocketState.Open)
{
return;
}
var bytes = Encoding.UTF8.GetBytes(message);
await webSocket.SendAsync(
new ArraySegment<byte>(bytes),
WebSocketMessageType.Text,
endOfMessage: true,
cancellationToken);
}
/// <summary>
/// Broadcasts an incident update to all connected clients for the specified tenant.
/// </summary>
public static async Task BroadcastIncidentUpdateAsync(
string tenantId,
NotifyDelivery delivery,
string updateType,
CancellationToken cancellationToken = default)
{
if (!_tenantSubscriptions.TryGetValue(tenantId, out var subscriptions))
{
return;
}
var message = JsonSerializer.Serialize(new
{
type = "incident_update",
updateType, // created, updated, acknowledged, delivered, failed
timestamp = DateTimeOffset.UtcNow,
incident = new
{
deliveryId = delivery.DeliveryId,
tenantId = delivery.TenantId,
ruleId = delivery.RuleId,
actionId = delivery.ActionId,
eventId = delivery.EventId.ToString(),
kind = delivery.Kind,
status = delivery.Status.ToString(),
statusReason = delivery.StatusReason,
attemptCount = delivery.Attempts.Length,
createdAt = delivery.CreatedAt,
sentAt = delivery.SentAt,
completedAt = delivery.CompletedAt
}
});
var deadSockets = new List<WebSocket>();
foreach (var socket in subscriptions)
{
if (socket.State != WebSocketState.Open)
{
deadSockets.Add(socket);
continue;
}
try
{
await SendMessageAsync(socket, message, cancellationToken);
}
catch
{
deadSockets.Add(socket);
}
}
// Clean up dead sockets
if (deadSockets.Count > 0)
{
var newBag = new ConcurrentBag<WebSocket>(
subscriptions.Where(s => !deadSockets.Contains(s)));
_tenantSubscriptions.TryUpdate(tenantId, newBag, subscriptions);
}
}
/// <summary>
/// Broadcasts incident statistics update to all connected clients for the specified tenant.
/// </summary>
public static async Task BroadcastStatsUpdateAsync(
string tenantId,
int total,
int pending,
int delivered,
int failed,
CancellationToken cancellationToken = default)
{
if (!_tenantSubscriptions.TryGetValue(tenantId, out var subscriptions))
{
return;
}
var message = JsonSerializer.Serialize(new
{
type = "stats_update",
timestamp = DateTimeOffset.UtcNow,
stats = new
{
total,
pending,
delivered,
failed
}
});
foreach (var socket in subscriptions.Where(s => s.State == WebSocketState.Open))
{
try
{
await SendMessageAsync(socket, message, cancellationToken);
}
catch
{
// Ignore send failures
}
}
}
/// <summary>
/// Gets the count of active WebSocket connections for a tenant.
/// </summary>
public static int GetConnectionCount(string tenantId)
{
if (_tenantSubscriptions.TryGetValue(tenantId, out var subscriptions))
{
return subscriptions.Count(s => s.State == WebSocketState.Open);
}
return 0;
}
}

View File

@@ -0,0 +1,305 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Routing;
using StellaOps.Notifier.Worker.Localization;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// REST API endpoints for localization operations.
/// </summary>
public static class LocalizationEndpoints
{
/// <summary>
/// Maps localization API endpoints.
/// </summary>
public static RouteGroupBuilder MapLocalizationEndpoints(this IEndpointRouteBuilder endpoints)
{
var group = endpoints.MapGroup("/api/v2/localization")
.WithTags("Localization")
.WithOpenApi();
// List bundles
group.MapGet("/bundles", async (
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var bundles = await localizationService.ListBundlesAsync(tenantId, cancellationToken);
return Results.Ok(new
{
tenantId,
bundles = bundles.Select(b => new
{
b.BundleId,
b.TenantId,
b.Locale,
b.Namespace,
stringCount = b.Strings.Count,
b.Priority,
b.Enabled,
b.Source,
b.Description,
b.CreatedAt,
b.UpdatedAt
}).ToList(),
count = bundles.Count
});
})
.WithName("ListLocalizationBundles")
.WithSummary("Lists all localization bundles for a tenant");
// Get supported locales
group.MapGet("/locales", async (
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var locales = await localizationService.GetSupportedLocalesAsync(tenantId, cancellationToken);
return Results.Ok(new
{
tenantId,
locales,
count = locales.Count
});
})
.WithName("GetSupportedLocales")
.WithSummary("Gets all supported locales for a tenant");
// Get bundle contents
group.MapGet("/bundles/{locale}", async (
string locale,
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var strings = await localizationService.GetBundleAsync(tenantId, locale, cancellationToken);
return Results.Ok(new
{
tenantId,
locale,
strings,
count = strings.Count
});
})
.WithName("GetLocalizationBundle")
.WithSummary("Gets all localized strings for a locale");
// Get single string
group.MapGet("/strings/{key}", async (
string key,
string? locale,
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var effectiveLocale = locale ?? "en-US";
var value = await localizationService.GetStringAsync(tenantId, key, effectiveLocale, cancellationToken);
return Results.Ok(new
{
tenantId,
key,
locale = effectiveLocale,
value
});
})
.WithName("GetLocalizedString")
.WithSummary("Gets a single localized string");
// Format string with parameters
group.MapPost("/strings/{key}/format", async (
string key,
FormatStringRequest request,
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var locale = request.Locale ?? "en-US";
var parameters = request.Parameters ?? new Dictionary<string, object>();
var value = await localizationService.GetFormattedStringAsync(
tenantId, key, locale, parameters, cancellationToken);
return Results.Ok(new
{
tenantId,
key,
locale,
formatted = value
});
})
.WithName("FormatLocalizedString")
.WithSummary("Gets a localized string with parameter substitution");
// Create/update bundle
group.MapPut("/bundles", async (
CreateBundleRequest request,
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var actor = context.Request.Headers["X-Actor"].FirstOrDefault() ?? "system";
var bundle = new LocalizationBundle
{
BundleId = request.BundleId ?? $"bundle-{Guid.NewGuid():N}"[..20],
TenantId = tenantId,
Locale = request.Locale,
Namespace = request.Namespace ?? "default",
Strings = request.Strings,
Priority = request.Priority,
Enabled = request.Enabled,
Description = request.Description,
Source = "api"
};
var result = await localizationService.UpsertBundleAsync(bundle, actor, cancellationToken);
if (!result.Success)
{
return Results.BadRequest(new { error = result.Error });
}
return result.IsNew
? Results.Created($"/api/v2/localization/bundles/{bundle.Locale}", new
{
bundleId = result.BundleId,
message = "Bundle created successfully"
})
: Results.Ok(new
{
bundleId = result.BundleId,
message = "Bundle updated successfully"
});
})
.WithName("UpsertLocalizationBundle")
.WithSummary("Creates or updates a localization bundle");
// Delete bundle
group.MapDelete("/bundles/{bundleId}", async (
string bundleId,
HttpContext context,
ILocalizationService localizationService,
CancellationToken cancellationToken) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var actor = context.Request.Headers["X-Actor"].FirstOrDefault() ?? "system";
var deleted = await localizationService.DeleteBundleAsync(tenantId, bundleId, actor, cancellationToken);
if (!deleted)
{
return Results.NotFound(new { error = $"Bundle '{bundleId}' not found" });
}
return Results.Ok(new { message = $"Bundle '{bundleId}' deleted successfully" });
})
.WithName("DeleteLocalizationBundle")
.WithSummary("Deletes a localization bundle");
// Validate bundle
group.MapPost("/bundles/validate", (
CreateBundleRequest request,
HttpContext context,
ILocalizationService localizationService) =>
{
var tenantId = context.Request.Headers["X-Tenant-Id"].FirstOrDefault() ?? "default";
var bundle = new LocalizationBundle
{
BundleId = request.BundleId ?? "validation",
TenantId = tenantId,
Locale = request.Locale,
Namespace = request.Namespace ?? "default",
Strings = request.Strings,
Priority = request.Priority,
Enabled = request.Enabled,
Description = request.Description
};
var result = localizationService.Validate(bundle);
return Results.Ok(new
{
result.IsValid,
result.Errors,
result.Warnings
});
})
.WithName("ValidateLocalizationBundle")
.WithSummary("Validates a localization bundle without saving");
return group;
}
}
/// <summary>
/// Request to format a localized string.
/// </summary>
public sealed record FormatStringRequest
{
/// <summary>
/// Target locale.
/// </summary>
public string? Locale { get; init; }
/// <summary>
/// Parameters for substitution.
/// </summary>
public Dictionary<string, object>? Parameters { get; init; }
}
/// <summary>
/// Request to create/update a localization bundle.
/// </summary>
public sealed record CreateBundleRequest
{
/// <summary>
/// Bundle ID (auto-generated if not provided).
/// </summary>
public string? BundleId { get; init; }
/// <summary>
/// Locale code.
/// </summary>
public required string Locale { get; init; }
/// <summary>
/// Namespace/category.
/// </summary>
public string? Namespace { get; init; }
/// <summary>
/// Localized strings.
/// </summary>
public required Dictionary<string, string> Strings { get; init; }
/// <summary>
/// Bundle priority.
/// </summary>
public int Priority { get; init; }
/// <summary>
/// Whether bundle is enabled.
/// </summary>
public bool Enabled { get; init; } = true;
/// <summary>
/// Bundle description.
/// </summary>
public string? Description { get; init; }
}

View File

@@ -0,0 +1,718 @@
using System.Collections.Immutable;
using System.Text.Json;
using System.Text.Json.Nodes;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Routing;
using StellaOps.Notifier.WebService.Contracts;
using StellaOps.Notifier.Worker.Dispatch;
using StellaOps.Notifier.Worker.Templates;
using StellaOps.Notify.Models;
using StellaOps.Notify.Storage.Mongo.Repositories;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// API endpoints for rules, templates, and incidents management.
/// </summary>
public static class NotifyApiEndpoints
{
/// <summary>
/// Maps all Notify API v2 endpoints.
/// </summary>
public static IEndpointRouteBuilder MapNotifyApiV2(this IEndpointRouteBuilder app)
{
var group = app.MapGroup("/api/v2/notify")
.WithTags("Notify")
.WithOpenApi();
// Rules CRUD
MapRulesEndpoints(group);
// Templates CRUD + Preview
MapTemplatesEndpoints(group);
// Incidents
MapIncidentsEndpoints(group);
return app;
}
private static void MapRulesEndpoints(RouteGroupBuilder group)
{
group.MapGet("/rules", async (
HttpContext context,
INotifyRuleRepository ruleRepository,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var rules = await ruleRepository.ListAsync(tenantId, cancellationToken);
var response = rules.Select(MapRuleToResponse).ToList();
return Results.Ok(response);
});
group.MapGet("/rules/{ruleId}", async (
HttpContext context,
string ruleId,
INotifyRuleRepository ruleRepository,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var rule = await ruleRepository.GetAsync(tenantId, ruleId, cancellationToken);
if (rule is null)
{
return Results.NotFound(Error("rule_not_found", $"Rule {ruleId} not found.", context));
}
return Results.Ok(MapRuleToResponse(rule));
});
group.MapPost("/rules", async (
HttpContext context,
RuleCreateRequest request,
INotifyRuleRepository ruleRepository,
INotifyAuditRepository auditRepository,
TimeProvider timeProvider,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var actor = GetActor(context);
var now = timeProvider.GetUtcNow();
var rule = MapRequestToRule(request, tenantId, actor, now);
await ruleRepository.UpsertAsync(rule, cancellationToken);
await AuditAsync(auditRepository, tenantId, "rule.created", actor, new Dictionary<string, string>
{
["ruleId"] = rule.RuleId,
["name"] = rule.Name
}, cancellationToken);
return Results.Created($"/api/v2/notify/rules/{rule.RuleId}", MapRuleToResponse(rule));
});
group.MapPut("/rules/{ruleId}", async (
HttpContext context,
string ruleId,
RuleUpdateRequest request,
INotifyRuleRepository ruleRepository,
INotifyAuditRepository auditRepository,
TimeProvider timeProvider,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var existing = await ruleRepository.GetAsync(tenantId, ruleId, cancellationToken);
if (existing is null)
{
return Results.NotFound(Error("rule_not_found", $"Rule {ruleId} not found.", context));
}
var actor = GetActor(context);
var now = timeProvider.GetUtcNow();
var updated = ApplyRuleUpdate(existing, request, actor, now);
await ruleRepository.UpsertAsync(updated, cancellationToken);
await AuditAsync(auditRepository, tenantId, "rule.updated", actor, new Dictionary<string, string>
{
["ruleId"] = updated.RuleId,
["name"] = updated.Name
}, cancellationToken);
return Results.Ok(MapRuleToResponse(updated));
});
group.MapDelete("/rules/{ruleId}", async (
HttpContext context,
string ruleId,
INotifyRuleRepository ruleRepository,
INotifyAuditRepository auditRepository,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var existing = await ruleRepository.GetAsync(tenantId, ruleId, cancellationToken);
if (existing is null)
{
return Results.NotFound(Error("rule_not_found", $"Rule {ruleId} not found.", context));
}
var actor = GetActor(context);
await ruleRepository.DeleteAsync(tenantId, ruleId, cancellationToken);
await AuditAsync(auditRepository, tenantId, "rule.deleted", actor, new Dictionary<string, string>
{
["ruleId"] = ruleId
}, cancellationToken);
return Results.NoContent();
});
}
private static void MapTemplatesEndpoints(RouteGroupBuilder group)
{
group.MapGet("/templates", async (
HttpContext context,
string? keyPrefix,
string? channelType,
string? locale,
int? limit,
INotifyTemplateService templateService,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
NotifyChannelType? channelTypeEnum = null;
if (!string.IsNullOrWhiteSpace(channelType) &&
Enum.TryParse<NotifyChannelType>(channelType, true, out var parsed))
{
channelTypeEnum = parsed;
}
var templates = await templateService.ListAsync(tenantId, new TemplateListOptions
{
KeyPrefix = keyPrefix,
ChannelType = channelTypeEnum,
Locale = locale,
Limit = limit
}, cancellationToken);
var response = templates.Select(MapTemplateToResponse).ToList();
return Results.Ok(response);
});
group.MapGet("/templates/{templateId}", async (
HttpContext context,
string templateId,
INotifyTemplateService templateService,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var template = await templateService.GetByIdAsync(tenantId, templateId, cancellationToken);
if (template is null)
{
return Results.NotFound(Error("template_not_found", $"Template {templateId} not found.", context));
}
return Results.Ok(MapTemplateToResponse(template));
});
group.MapPost("/templates", async (
HttpContext context,
TemplateCreateRequest request,
INotifyTemplateService templateService,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var actor = GetActor(context);
if (!Enum.TryParse<NotifyChannelType>(request.ChannelType, true, out var channelType))
{
return Results.BadRequest(Error("invalid_channel_type", $"Invalid channel type: {request.ChannelType}", context));
}
var renderMode = NotifyTemplateRenderMode.Markdown;
if (!string.IsNullOrWhiteSpace(request.RenderMode) &&
Enum.TryParse<NotifyTemplateRenderMode>(request.RenderMode, true, out var parsedMode))
{
renderMode = parsedMode;
}
var format = NotifyDeliveryFormat.Json;
if (!string.IsNullOrWhiteSpace(request.Format) &&
Enum.TryParse<NotifyDeliveryFormat>(request.Format, true, out var parsedFormat))
{
format = parsedFormat;
}
var template = NotifyTemplate.Create(
templateId: request.TemplateId,
tenantId: tenantId,
channelType: channelType,
key: request.Key,
locale: request.Locale,
body: request.Body,
renderMode: renderMode,
format: format,
description: request.Description,
metadata: request.Metadata);
var result = await templateService.UpsertAsync(template, actor, cancellationToken);
if (!result.Success)
{
return Results.BadRequest(Error("template_validation_failed", result.Error ?? "Validation failed.", context));
}
var created = await templateService.GetByIdAsync(tenantId, request.TemplateId, cancellationToken);
return result.IsNew
? Results.Created($"/api/v2/notify/templates/{request.TemplateId}", MapTemplateToResponse(created!))
: Results.Ok(MapTemplateToResponse(created!));
});
group.MapDelete("/templates/{templateId}", async (
HttpContext context,
string templateId,
INotifyTemplateService templateService,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var actor = GetActor(context);
var deleted = await templateService.DeleteAsync(tenantId, templateId, actor, cancellationToken);
if (!deleted)
{
return Results.NotFound(Error("template_not_found", $"Template {templateId} not found.", context));
}
return Results.NoContent();
});
group.MapPost("/templates/preview", async (
HttpContext context,
TemplatePreviewRequest request,
INotifyTemplateService templateService,
INotifyTemplateRenderer templateRenderer,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
NotifyTemplate? template = null;
List<string>? warnings = null;
if (!string.IsNullOrWhiteSpace(request.TemplateId))
{
template = await templateService.GetByIdAsync(tenantId, request.TemplateId, cancellationToken);
if (template is null)
{
return Results.NotFound(Error("template_not_found", $"Template {request.TemplateId} not found.", context));
}
}
else if (!string.IsNullOrWhiteSpace(request.TemplateBody))
{
var validation = templateService.Validate(request.TemplateBody);
if (!validation.IsValid)
{
return Results.BadRequest(Error("template_invalid", string.Join("; ", validation.Errors), context));
}
warnings = validation.Warnings.ToList();
var format = NotifyDeliveryFormat.PlainText;
if (!string.IsNullOrWhiteSpace(request.OutputFormat) &&
Enum.TryParse<NotifyDeliveryFormat>(request.OutputFormat, true, out var parsedFormat))
{
format = parsedFormat;
}
template = NotifyTemplate.Create(
templateId: "preview",
tenantId: tenantId,
channelType: NotifyChannelType.Custom,
key: "preview",
locale: "en-us",
body: request.TemplateBody,
format: format);
}
else
{
return Results.BadRequest(Error("template_required", "Either templateId or templateBody must be provided.", context));
}
var sampleEvent = NotifyEvent.Create(
eventId: Guid.NewGuid(),
kind: request.EventKind ?? "preview.event",
tenant: tenantId,
ts: DateTimeOffset.UtcNow,
payload: request.SamplePayload ?? new JsonObject(),
attributes: request.SampleAttributes ?? new Dictionary<string, string>(),
actor: "preview",
version: "1");
var rendered = await templateRenderer.RenderAsync(template, sampleEvent, cancellationToken);
return Results.Ok(new TemplatePreviewResponse
{
RenderedBody = rendered.Body,
RenderedSubject = rendered.Subject,
BodyHash = rendered.BodyHash,
Format = rendered.Format.ToString(),
Warnings = warnings
});
});
group.MapPost("/templates/validate", (
HttpContext context,
TemplatePreviewRequest request,
INotifyTemplateService templateService) =>
{
if (string.IsNullOrWhiteSpace(request.TemplateBody))
{
return Results.BadRequest(Error("template_body_required", "templateBody is required.", context));
}
var result = templateService.Validate(request.TemplateBody);
return Results.Ok(new
{
isValid = result.IsValid,
errors = result.Errors,
warnings = result.Warnings
});
});
}
private static void MapIncidentsEndpoints(RouteGroupBuilder group)
{
group.MapGet("/incidents", async (
HttpContext context,
string? status,
string? eventKindPrefix,
DateTimeOffset? since,
DateTimeOffset? until,
int? limit,
string? cursor,
INotifyDeliveryRepository deliveryRepository,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
// For now, return recent deliveries grouped by event kind as "incidents"
// Full incident correlation will be implemented in NOTIFY-SVC-39-001
var queryResult = await deliveryRepository.QueryAsync(tenantId, since, status, limit ?? 100, cursor, cancellationToken);
var deliveries = queryResult.Items;
var incidents = deliveries
.GroupBy(d => d.EventId)
.Select(g => new IncidentResponse
{
IncidentId = g.Key.ToString(),
TenantId = tenantId,
EventKind = g.First().Kind,
Status = g.All(d => d.Status == NotifyDeliveryStatus.Delivered) ? "resolved" : "open",
Severity = "medium",
Title = $"Notification: {g.First().Kind}",
Description = null,
EventCount = g.Count(),
FirstOccurrence = g.Min(d => d.CreatedAt),
LastOccurrence = g.Max(d => d.CreatedAt),
Labels = null,
Metadata = null
})
.ToList();
return Results.Ok(new IncidentListResponse
{
Incidents = incidents,
TotalCount = incidents.Count,
NextCursor = queryResult.ContinuationToken
});
});
group.MapPost("/incidents/{incidentId}/ack", async (
HttpContext context,
string incidentId,
IncidentAckRequest request,
INotifyAuditRepository auditRepository,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var actor = request.Actor ?? GetActor(context);
await AuditAsync(auditRepository, tenantId, "incident.acknowledged", actor, new Dictionary<string, string>
{
["incidentId"] = incidentId,
["comment"] = request.Comment ?? ""
}, cancellationToken);
return Results.NoContent();
});
group.MapPost("/incidents/{incidentId}/resolve", async (
HttpContext context,
string incidentId,
IncidentResolveRequest request,
INotifyAuditRepository auditRepository,
CancellationToken cancellationToken) =>
{
var tenantId = GetTenantId(context);
if (tenantId is null)
{
return Results.BadRequest(Error("tenant_missing", "X-StellaOps-Tenant header is required.", context));
}
var actor = request.Actor ?? GetActor(context);
await AuditAsync(auditRepository, tenantId, "incident.resolved", actor, new Dictionary<string, string>
{
["incidentId"] = incidentId,
["reason"] = request.Reason ?? "",
["comment"] = request.Comment ?? ""
}, cancellationToken);
return Results.NoContent();
});
}
#region Helpers
private static string? GetTenantId(HttpContext context)
{
var value = context.Request.Headers["X-StellaOps-Tenant"].ToString();
return string.IsNullOrWhiteSpace(value) ? null : value;
}
private static string GetActor(HttpContext context)
{
return context.Request.Headers["X-StellaOps-Actor"].ToString() is { Length: > 0 } actor
? actor
: "api";
}
private static object Error(string code, string message, HttpContext context) => new
{
error = new
{
code,
message,
traceId = context.TraceIdentifier
}
};
private static async Task AuditAsync(
INotifyAuditRepository repository,
string tenantId,
string action,
string actor,
Dictionary<string, string> metadata,
CancellationToken cancellationToken)
{
try
{
await repository.AppendAsync(tenantId, action, actor, metadata, cancellationToken);
}
catch
{
// Ignore audit failures
}
}
#endregion
#region Mappers
private static RuleResponse MapRuleToResponse(NotifyRule rule)
{
return new RuleResponse
{
RuleId = rule.RuleId,
TenantId = rule.TenantId,
Name = rule.Name,
Description = rule.Description,
Enabled = rule.Enabled,
Match = new RuleMatchResponse
{
EventKinds = rule.Match.EventKinds.ToList(),
Namespaces = rule.Match.Namespaces.ToList(),
Repositories = rule.Match.Repositories.ToList(),
Digests = rule.Match.Digests.ToList(),
Labels = rule.Match.Labels.ToList(),
ComponentPurls = rule.Match.ComponentPurls.ToList(),
MinSeverity = rule.Match.MinSeverity,
Verdicts = rule.Match.Verdicts.ToList(),
KevOnly = rule.Match.KevOnly
},
Actions = rule.Actions.Select(a => new RuleActionResponse
{
ActionId = a.ActionId,
Channel = a.Channel,
Template = a.Template,
Digest = a.Digest,
Throttle = a.Throttle?.ToString(),
Locale = a.Locale,
Enabled = a.Enabled,
Metadata = a.Metadata.ToDictionary(kv => kv.Key, kv => kv.Value)
}).ToList(),
Labels = rule.Labels.ToDictionary(kv => kv.Key, kv => kv.Value),
Metadata = rule.Metadata.ToDictionary(kv => kv.Key, kv => kv.Value),
CreatedBy = rule.CreatedBy,
CreatedAt = rule.CreatedAt,
UpdatedBy = rule.UpdatedBy,
UpdatedAt = rule.UpdatedAt
};
}
private static NotifyRule MapRequestToRule(
RuleCreateRequest request,
string tenantId,
string actor,
DateTimeOffset now)
{
var match = NotifyRuleMatch.Create(
eventKinds: request.Match.EventKinds,
namespaces: request.Match.Namespaces,
repositories: request.Match.Repositories,
digests: request.Match.Digests,
labels: request.Match.Labels,
componentPurls: request.Match.ComponentPurls,
minSeverity: request.Match.MinSeverity,
verdicts: request.Match.Verdicts,
kevOnly: request.Match.KevOnly);
var actions = request.Actions.Select(a => NotifyRuleAction.Create(
actionId: a.ActionId,
channel: a.Channel,
template: a.Template,
digest: a.Digest,
throttle: string.IsNullOrWhiteSpace(a.Throttle) ? null : System.Xml.XmlConvert.ToTimeSpan(a.Throttle),
locale: a.Locale,
enabled: a.Enabled,
metadata: a.Metadata));
return NotifyRule.Create(
ruleId: request.RuleId,
tenantId: tenantId,
name: request.Name,
match: match,
actions: actions,
enabled: request.Enabled,
description: request.Description,
labels: request.Labels,
metadata: request.Metadata,
createdBy: actor,
createdAt: now,
updatedBy: actor,
updatedAt: now);
}
private static NotifyRule ApplyRuleUpdate(
NotifyRule existing,
RuleUpdateRequest request,
string actor,
DateTimeOffset now)
{
var match = request.Match is not null
? NotifyRuleMatch.Create(
eventKinds: request.Match.EventKinds ?? existing.Match.EventKinds.ToList(),
namespaces: request.Match.Namespaces ?? existing.Match.Namespaces.ToList(),
repositories: request.Match.Repositories ?? existing.Match.Repositories.ToList(),
digests: request.Match.Digests ?? existing.Match.Digests.ToList(),
labels: request.Match.Labels ?? existing.Match.Labels.ToList(),
componentPurls: request.Match.ComponentPurls ?? existing.Match.ComponentPurls.ToList(),
minSeverity: request.Match.MinSeverity ?? existing.Match.MinSeverity,
verdicts: request.Match.Verdicts ?? existing.Match.Verdicts.ToList(),
kevOnly: request.Match.KevOnly ?? existing.Match.KevOnly)
: existing.Match;
var actions = request.Actions is not null
? request.Actions.Select(a => NotifyRuleAction.Create(
actionId: a.ActionId,
channel: a.Channel,
template: a.Template,
digest: a.Digest,
throttle: string.IsNullOrWhiteSpace(a.Throttle) ? null : System.Xml.XmlConvert.ToTimeSpan(a.Throttle),
locale: a.Locale,
enabled: a.Enabled,
metadata: a.Metadata))
: existing.Actions;
return NotifyRule.Create(
ruleId: existing.RuleId,
tenantId: existing.TenantId,
name: request.Name ?? existing.Name,
match: match,
actions: actions,
enabled: request.Enabled ?? existing.Enabled,
description: request.Description ?? existing.Description,
labels: request.Labels ?? existing.Labels.ToDictionary(kv => kv.Key, kv => kv.Value),
metadata: request.Metadata ?? existing.Metadata.ToDictionary(kv => kv.Key, kv => kv.Value),
createdBy: existing.CreatedBy,
createdAt: existing.CreatedAt,
updatedBy: actor,
updatedAt: now);
}
private static TemplateResponse MapTemplateToResponse(NotifyTemplate template)
{
return new TemplateResponse
{
TemplateId = template.TemplateId,
TenantId = template.TenantId,
Key = template.Key,
ChannelType = template.ChannelType.ToString(),
Locale = template.Locale,
Body = template.Body,
RenderMode = template.RenderMode.ToString(),
Format = template.Format.ToString(),
Description = template.Description,
Metadata = template.Metadata.ToDictionary(kv => kv.Key, kv => kv.Value),
CreatedBy = template.CreatedBy,
CreatedAt = template.CreatedAt,
UpdatedBy = template.UpdatedBy,
UpdatedAt = template.UpdatedAt
};
}
#endregion
}

View File

@@ -0,0 +1,407 @@
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using Microsoft.AspNetCore.Routing;
using StellaOps.Notifier.Worker.Observability;
namespace StellaOps.Notifier.WebService.Endpoints;
/// <summary>
/// REST API endpoints for observability services.
/// </summary>
public static class ObservabilityEndpoints
{
/// <summary>
/// Maps observability endpoints.
/// </summary>
public static IEndpointRouteBuilder MapObservabilityEndpoints(this IEndpointRouteBuilder endpoints)
{
var group = endpoints.MapGroup("/api/v1/observability")
.WithTags("Observability");
// Metrics endpoints
group.MapGet("/metrics", GetMetricsSnapshot)
.WithName("GetMetricsSnapshot")
.WithSummary("Gets current metrics snapshot");
group.MapGet("/metrics/{tenantId}", GetTenantMetrics)
.WithName("GetTenantMetrics")
.WithSummary("Gets metrics for a specific tenant");
// Dead letter endpoints
group.MapGet("/dead-letters/{tenantId}", GetDeadLetters)
.WithName("GetDeadLetters")
.WithSummary("Lists dead letter entries for a tenant");
group.MapGet("/dead-letters/{tenantId}/{entryId}", GetDeadLetterEntry)
.WithName("GetDeadLetterEntry")
.WithSummary("Gets a specific dead letter entry");
group.MapPost("/dead-letters/{tenantId}/{entryId}/retry", RetryDeadLetter)
.WithName("RetryDeadLetter")
.WithSummary("Retries a dead letter entry");
group.MapPost("/dead-letters/{tenantId}/{entryId}/discard", DiscardDeadLetter)
.WithName("DiscardDeadLetter")
.WithSummary("Discards a dead letter entry");
group.MapGet("/dead-letters/{tenantId}/stats", GetDeadLetterStats)
.WithName("GetDeadLetterStats")
.WithSummary("Gets dead letter statistics");
group.MapDelete("/dead-letters/{tenantId}/purge", PurgeDeadLetters)
.WithName("PurgeDeadLetters")
.WithSummary("Purges old dead letter entries");
// Chaos testing endpoints
group.MapGet("/chaos/experiments", ListChaosExperiments)
.WithName("ListChaosExperiments")
.WithSummary("Lists chaos experiments");
group.MapGet("/chaos/experiments/{experimentId}", GetChaosExperiment)
.WithName("GetChaosExperiment")
.WithSummary("Gets a chaos experiment");
group.MapPost("/chaos/experiments", StartChaosExperiment)
.WithName("StartChaosExperiment")
.WithSummary("Starts a new chaos experiment");
group.MapPost("/chaos/experiments/{experimentId}/stop", StopChaosExperiment)
.WithName("StopChaosExperiment")
.WithSummary("Stops a running chaos experiment");
group.MapGet("/chaos/experiments/{experimentId}/results", GetChaosResults)
.WithName("GetChaosResults")
.WithSummary("Gets chaos experiment results");
// Retention policy endpoints
group.MapGet("/retention/policies", ListRetentionPolicies)
.WithName("ListRetentionPolicies")
.WithSummary("Lists retention policies");
group.MapGet("/retention/policies/{policyId}", GetRetentionPolicy)
.WithName("GetRetentionPolicy")
.WithSummary("Gets a retention policy");
group.MapPost("/retention/policies", CreateRetentionPolicy)
.WithName("CreateRetentionPolicy")
.WithSummary("Creates a retention policy");
group.MapPut("/retention/policies/{policyId}", UpdateRetentionPolicy)
.WithName("UpdateRetentionPolicy")
.WithSummary("Updates a retention policy");
group.MapDelete("/retention/policies/{policyId}", DeleteRetentionPolicy)
.WithName("DeleteRetentionPolicy")
.WithSummary("Deletes a retention policy");
group.MapPost("/retention/execute", ExecuteRetention)
.WithName("ExecuteRetention")
.WithSummary("Executes retention policies");
group.MapGet("/retention/policies/{policyId}/preview", PreviewRetention)
.WithName("PreviewRetention")
.WithSummary("Previews retention policy effects");
group.MapGet("/retention/policies/{policyId}/history", GetRetentionHistory)
.WithName("GetRetentionHistory")
.WithSummary("Gets retention execution history");
return endpoints;
}
// Metrics handlers
private static IResult GetMetricsSnapshot(
[FromServices] INotifierMetrics metrics)
{
var snapshot = metrics.GetSnapshot();
return Results.Ok(snapshot);
}
private static IResult GetTenantMetrics(
string tenantId,
[FromServices] INotifierMetrics metrics)
{
var snapshot = metrics.GetSnapshot(tenantId);
return Results.Ok(snapshot);
}
// Dead letter handlers
private static async Task<IResult> GetDeadLetters(
string tenantId,
[FromQuery] int limit,
[FromQuery] int offset,
[FromServices] IDeadLetterHandler handler,
CancellationToken ct)
{
var entries = await handler.GetEntriesAsync(
tenantId,
limit: limit > 0 ? limit : 100,
offset: offset,
ct: ct);
return Results.Ok(entries);
}
private static async Task<IResult> GetDeadLetterEntry(
string tenantId,
string entryId,
[FromServices] IDeadLetterHandler handler,
CancellationToken ct)
{
var entry = await handler.GetEntryAsync(tenantId, entryId, ct);
if (entry is null)
{
return Results.NotFound(new { error = "Dead letter entry not found" });
}
return Results.Ok(entry);
}
private static async Task<IResult> RetryDeadLetter(
string tenantId,
string entryId,
[FromBody] RetryDeadLetterRequest request,
[FromServices] IDeadLetterHandler handler,
CancellationToken ct)
{
var result = await handler.RetryAsync(tenantId, entryId, request.Actor, ct);
return Results.Ok(result);
}
private static async Task<IResult> DiscardDeadLetter(
string tenantId,
string entryId,
[FromBody] DiscardDeadLetterRequest request,
[FromServices] IDeadLetterHandler handler,
CancellationToken ct)
{
await handler.DiscardAsync(tenantId, entryId, request.Reason, request.Actor, ct);
return Results.NoContent();
}
private static async Task<IResult> GetDeadLetterStats(
string tenantId,
[FromQuery] int? windowHours,
[FromServices] IDeadLetterHandler handler,
CancellationToken ct)
{
var window = windowHours.HasValue
? TimeSpan.FromHours(windowHours.Value)
: (TimeSpan?)null;
var stats = await handler.GetStatisticsAsync(tenantId, window, ct);
return Results.Ok(stats);
}
private static async Task<IResult> PurgeDeadLetters(
string tenantId,
[FromQuery] int olderThanDays,
[FromServices] IDeadLetterHandler handler,
CancellationToken ct)
{
var olderThan = TimeSpan.FromDays(olderThanDays > 0 ? olderThanDays : 7);
var count = await handler.PurgeAsync(tenantId, olderThan, ct);
return Results.Ok(new { purged = count });
}
// Chaos testing handlers
private static async Task<IResult> ListChaosExperiments(
[FromQuery] string? status,
[FromQuery] int limit,
[FromServices] IChaosTestRunner runner,
CancellationToken ct)
{
ChaosExperimentStatus? parsedStatus = null;
if (!string.IsNullOrEmpty(status) && Enum.TryParse<ChaosExperimentStatus>(status, true, out var s))
{
parsedStatus = s;
}
var experiments = await runner.ListExperimentsAsync(parsedStatus, limit > 0 ? limit : 100, ct);
return Results.Ok(experiments);
}
private static async Task<IResult> GetChaosExperiment(
string experimentId,
[FromServices] IChaosTestRunner runner,
CancellationToken ct)
{
var experiment = await runner.GetExperimentAsync(experimentId, ct);
if (experiment is null)
{
return Results.NotFound(new { error = "Experiment not found" });
}
return Results.Ok(experiment);
}
private static async Task<IResult> StartChaosExperiment(
[FromBody] ChaosExperimentConfig config,
[FromServices] IChaosTestRunner runner,
CancellationToken ct)
{
try
{
var experiment = await runner.StartExperimentAsync(config, ct);
return Results.Created($"/api/v1/observability/chaos/experiments/{experiment.Id}", experiment);
}
catch (InvalidOperationException ex)
{
return Results.BadRequest(new { error = ex.Message });
}
catch (UnauthorizedAccessException ex)
{
return Results.Forbid();
}
}
private static async Task<IResult> StopChaosExperiment(
string experimentId,
[FromServices] IChaosTestRunner runner,
CancellationToken ct)
{
await runner.StopExperimentAsync(experimentId, ct);
return Results.NoContent();
}
private static async Task<IResult> GetChaosResults(
string experimentId,
[FromServices] IChaosTestRunner runner,
CancellationToken ct)
{
var results = await runner.GetResultsAsync(experimentId, ct);
return Results.Ok(results);
}
// Retention policy handlers
private static async Task<IResult> ListRetentionPolicies(
[FromQuery] string? tenantId,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
var policies = await service.ListPoliciesAsync(tenantId, ct);
return Results.Ok(policies);
}
private static async Task<IResult> GetRetentionPolicy(
string policyId,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
var policy = await service.GetPolicyAsync(policyId, ct);
if (policy is null)
{
return Results.NotFound(new { error = "Policy not found" });
}
return Results.Ok(policy);
}
private static async Task<IResult> CreateRetentionPolicy(
[FromBody] RetentionPolicy policy,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
try
{
await service.RegisterPolicyAsync(policy, ct);
return Results.Created($"/api/v1/observability/retention/policies/{policy.Id}", policy);
}
catch (InvalidOperationException ex)
{
return Results.Conflict(new { error = ex.Message });
}
catch (ArgumentException ex)
{
return Results.BadRequest(new { error = ex.Message });
}
}
private static async Task<IResult> UpdateRetentionPolicy(
string policyId,
[FromBody] RetentionPolicy policy,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
try
{
await service.UpdatePolicyAsync(policyId, policy, ct);
return Results.Ok(policy);
}
catch (KeyNotFoundException)
{
return Results.NotFound(new { error = "Policy not found" });
}
catch (ArgumentException ex)
{
return Results.BadRequest(new { error = ex.Message });
}
}
private static async Task<IResult> DeleteRetentionPolicy(
string policyId,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
await service.DeletePolicyAsync(policyId, ct);
return Results.NoContent();
}
private static async Task<IResult> ExecuteRetention(
[FromQuery] string? policyId,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
var result = await service.ExecuteRetentionAsync(policyId, ct);
return Results.Ok(result);
}
private static async Task<IResult> PreviewRetention(
string policyId,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
try
{
var preview = await service.PreviewRetentionAsync(policyId, ct);
return Results.Ok(preview);
}
catch (KeyNotFoundException)
{
return Results.NotFound(new { error = "Policy not found" });
}
}
private static async Task<IResult> GetRetentionHistory(
string policyId,
[FromQuery] int limit,
[FromServices] IRetentionPolicyService service,
CancellationToken ct)
{
var history = await service.GetExecutionHistoryAsync(policyId, limit > 0 ? limit : 100, ct);
return Results.Ok(history);
}
}
/// <summary>
/// Request to retry a dead letter entry.
/// </summary>
public sealed record RetryDeadLetterRequest
{
/// <summary>
/// Actor performing the retry.
/// </summary>
public required string Actor { get; init; }
}
/// <summary>
/// Request to discard a dead letter entry.
/// </summary>
public sealed record DiscardDeadLetterRequest
{
/// <summary>
/// Reason for discarding.
/// </summary>
public required string Reason { get; init; }
/// <summary>
/// Actor performing the discard.
/// </summary>
public required string Actor { get; init; }
}

Some files were not shown because too many files have changed in this diff Show More