release orchestration strengthening

This commit is contained in:
master
2026-01-17 21:32:03 +02:00
parent 195dff2457
commit da27b9faa9
256 changed files with 94634 additions and 2269 deletions

View File

@@ -1,744 +0,0 @@
# Feature Gaps Report - Stella Ops Suite
*(Auto-generated during feature matrix completion)*
This report documents:
1. Features discovered in code but not listed in FEATURE_MATRIX.md
2. CLI/UI coverage gaps for existing features
---
## Batch 1: SBOM & Ingestion
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| SPDX 3.0 Build Attestation | Attestor | `BuildAttestationMapper.cs`, `DsseSpdx3Signer.cs`, `CombinedDocumentBuilder.cs` | - | - | Attestation & Signing |
| CycloneDX CBOM Support | Scanner | `CycloneDxCbomWriter.cs` | - | - | SBOM & Ingestion |
| Trivy DB Export (Offline) | Concelier | `TrivyDbExporterPlugin.cs`, `TrivyDbOrasPusher.cs`, `TrivyDbExportPlanner.cs` | `stella db export trivy` | - | Offline & Air-Gap |
| Layer SBOM Composition | Scanner | `SpdxLayerWriter.cs`, `CycloneDxLayerWriter.cs`, `LayerSbomService.cs` | `stella sbomer layer`, `stella scan layer-sbom` | - | SBOM & Ingestion |
| SBOM Advisory Matching | Concelier | `SbomAdvisoryMatcher.cs`, `SbomRegistryService.cs`, `ValkeyPurlCanonicalIndex.cs` | - | - | Advisory Sources |
| Graph Lineage Service | Graph | `IGraphLineageService.cs`, `InMemoryGraphLineageService.cs`, `LineageContracts.cs` | - | `/graph` | SBOM & Ingestion |
| Evidence Cards (SBOM excerpts) | Evidence.Pack | `IEvidenceCardService.cs`, `EvidenceCardService.cs`, `EvidenceCard.cs` | - | Evidence drawer | Evidence & Findings |
| AirGap SBOM Parsing | AirGap | `SpdxParser.cs`, `CycloneDxParser.cs` | - | `/ops/offline-kit` | Offline & Air-Gap |
| SPDX License Normalization | Scanner | `SpdxLicenseNormalizer.cs`, `SpdxLicenseExpressions.cs`, `SpdxLicenseList.cs` | - | - | Scanning & Detection |
| SBOM Format Conversion | Scanner | `SpdxCycloneDxConverter.cs` | - | - | SBOM & Ingestion |
| SBOM Validation Pipeline | Scanner | `SbomValidationPipeline.cs`, `SemanticSbomExtensions.cs` | - | - | SBOM & Ingestion |
| CycloneDX Evidence Mapping | Scanner | `CycloneDxEvidenceMapper.cs` | - | - | SBOM & Ingestion |
| CycloneDX Pedigree Mapping | Scanner | `CycloneDxPedigreeMapper.cs` | - | - | SBOM & Ingestion |
| SBOM Snapshot Export | Graph | `SbomSnapshot.cs`, `SbomSnapshotExporter.cs` | - | - | Evidence & Findings |
| Lineage Evidence Packs | ExportCenter | `ILineageEvidencePackService.cs`, `LineageEvidencePack.cs`, `LineageExportEndpoints.cs` | - | `/triage/audit-bundles` | Evidence & Findings |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Delta-SBOM Cache | SbomService | No | No | Internal optimization - no action needed |
| SBOM Lineage Ledger | SbomService | No | Yes | Add `stella sbom lineage list/show` commands |
| SBOM Lineage API | SbomService | No | Yes | Add `stella sbom lineage export` command |
| SPDX 3.0 Build Attestation | Attestor | No | No | Add to Attestation & Signing matrix section |
| Graph Lineage Service | Graph | No | Yes | Consider `stella graph lineage` command |
| Trivy DB Export | Concelier | Partial | No | `stella db export trivy` exists but may need UI |
---
## Batch 2: Scanning & Detection
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| Secrets Detection (Regex+Entropy) | Scanner | `SecretsAnalyzer.cs`, `RegexDetector.cs`, `EntropyDetector.cs`, `CompositeSecretDetector.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| OS Analyzers - Dpkg (Debian/Ubuntu) | Scanner | `DpkgPackageAnalyzer.cs`, `DpkgStatusParser.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| OS Analyzers - Apk (Alpine) | Scanner | `ApkPackageAnalyzer.cs`, `ApkDatabaseParser.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| OS Analyzers - RPM (RHEL/CentOS) | Scanner | `RpmPackageAnalyzer.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| OS Analyzers - Homebrew (macOS) | Scanner | `HomebrewPackageAnalyzer.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| OS Analyzers - macOS Bundles | Scanner | `MacOsBundleAnalyzer.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| OS Analyzers - Windows (Chocolatey/MSI/WinSxS) | Scanner | `ChocolateyAnalyzer.cs`, `MsiAnalyzer.cs`, `WinSxSAnalyzer.cs` | `stella scan run` | `/findings` | Scanning & Detection |
| Symbol-Level Vulnerability Matching | Scanner | `VulnSurfaceService.cs`, `AdvisorySymbolMapping.cs`, `AffectedSymbol.cs` | - | - | Scanning & Detection |
| SARIF 2.1.0 Export | Scanner | SARIF export in CLI | `stella scan sarif` | - | Scanning & Detection |
| Fidelity Upgrade (Quick->Standard->Deep) | Scanner | `FidelityAwareAnalyzer.UpgradeFidelityAsync()` | - | - | Scanning & Detection |
| OCI Multi-Architecture Support | Scanner | `OciImageInspector.cs` (amd64, arm64, etc.) | `stella image inspect` | - | Scanning & Detection |
| Symlink Resolution (32-level depth) | Scanner | `LayeredRootFileSystem.cs` | - | - | Scanning & Detection |
| Whiteout File Support | Scanner | `LayeredRootFileSystem.cs` | - | - | Scanning & Detection |
| NATS/Redis Scan Queue | Scanner | `NatsScanQueue.cs`, `RedisScanQueue.cs` | - | `/ops/scanner` | Operations |
| Determinism Controls | Scanner | `DeterminismContext.cs`, `DeterministicTimeProvider.cs`, `DeterministicRandomProvider.cs` | `stella scan replay` | `/ops/scanner` | Determinism & Reproducibility |
| Lease-Based Job Processing | Scanner | `LeaseHeartbeatService.cs`, `ScanJobProcessor.cs` | - | - | Operations |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| License-Risk Detection | Scanner | No | No | Planned Q4-2025 - not yet implemented |
| Secrets Detection | Scanner | Implicit | Implicit | Document in matrix (runs automatically during scan) |
| OS Package Analyzers | Scanner | Implicit | Implicit | Document in matrix (6 OS-level analyzers) |
| Symbol-Level Matching | Scanner | No | No | Advanced feature - consider exposing in findings detail |
| SARIF Export | Scanner | Yes | No | Consider adding SARIF download in UI |
| Concurrent Worker Config | Scanner | No | Yes | CLI option for worker count would help CI/CD |
---
## Batch 3: Reachability Analysis
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| 8-State Reachability Lattice | Reachability.Core | `ReachabilityLattice.cs` (28 state transitions) | - | `/reachability` | Reachability Analysis |
| Confidence Calculator | Reachability.Core | `ConfidenceCalculator.cs` (path/guard/hit bonuses) | - | - | Reachability Analysis |
| Evidence Weighted Score (EWS) | Signals | `EvidenceWeightedScoreCalculator.cs` (6 dimensions: RCH/RTS/BKP/XPL/SRC/MIT) | - | - | Scoring & Risk |
| Attested Reduction Scoring | Signals | VEX anchoring with short-circuit rules | - | - | Scoring & Risk |
| Hybrid Reachability Query | Reachability.Core | `IReachabilityIndex.cs` (static/runtime/hybrid/batch modes) | `stella reachgraph slice` | `/reachability` | Reachability Analysis |
| Reachability Replay/Verify | ReachGraph | `IReachabilityReplayService.VerifyAsync()` | `stella reachgraph replay/verify` | - | Determinism & Reproducibility |
| Graph Triple-Layer Storage | ReachGraph | `ReachGraphStoreService.cs` (Cache->DB->Archive) | - | - | Operations |
| Per-Graph Signing | ReachGraph | SHA256 artifact/provenance digests | - | - | Attestation & Signing |
| GraphViz/Mermaid Export | CLI | `stella reachability show --format dot/mermaid` | `stella reachability show` | - | Reachability Analysis |
| Reachability Drift Alerts | Docs | `19-reachability-drift-alert-flow.md` (state transition monitoring) | `stella drift` | - | Reachability Analysis |
| Evidence URIs | ReachGraph | `stella://reachgraph/{digest}/slice/{symbolId}` format | - | - | Evidence & Findings |
| Environment Guard Detection | Scanner | 20+ patterns (process.env, sys.platform, etc.) | - | `/reachability` | Reachability Analysis |
| Dynamic Loading Detection | Scanner | require(variable), import(variable), Class.forName() | - | - | Reachability Analysis |
| Reflection Call Detection | Scanner | Confidence scoring 0.5-0.6 for dynamic paths | - | - | Reachability Analysis |
| EWS Guardrails | Signals | Speculative cap (45), not-affected cap (15), runtime floor (60) | - | - | Scoring & Risk |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Runtime Signal Correlation | Signals | No | Yes | Add `stella signals inspect` command |
| Gate Detection | Scanner | No | Yes | Consider `stella reachability guards` command |
| Path Witness Generation | ReachGraph | Yes | No | Add witness path visualization in UI |
| Confidence Calculator | Reachability.Core | No | No | Internal implementation - consider exposing in findings |
| Evidence Weighted Score | Signals | No | Partial | Add `stella score explain` command |
| Graph Triple-Layer Storage | ReachGraph | No | No | Ops concern - consider admin commands |
---
## Batch 4: Binary Analysis
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| 4 Fingerprint Algorithm Types | BinaryIndex | `BasicBlockFingerprintGenerator.cs`, `ControlFlowGraphFingerprintGenerator.cs`, `StringRefsFingerprintGenerator.cs` | `stella binary fingerprint` | - | Binary Analysis |
| Alpine Corpus Support | BinaryIndex | `AlpineCorpusConnector.cs` | - | - | Binary Analysis |
| VEX Evidence Bridge | BinaryIndex | `IVexEvidenceGenerator.cs` | - | - | VEX Processing |
| Delta Signature Matching | BinaryIndex | `LookupByDeltaSignatureAsync()` | `stella deltasig` | - | Binary Analysis |
| Symbol Hash Matching | BinaryIndex | `LookupBySymbolHashAsync()` | `stella binary symbols` | - | Binary Analysis |
| Corpus Function Identification | BinaryIndex | `IdentifyFunctionFromCorpusAsync()` | - | - | Binary Analysis |
| Binary Call Graph Extraction | BinaryIndex | `binary callgraph` command | `stella binary callgraph` | - | Binary Analysis |
| 3-Tier Identification Strategy | BinaryIndex | Package/Build-ID/Fingerprint tiers | - | - | Binary Analysis |
| Fingerprint Validation Stats | BinaryIndex | `FingerprintValidationStats.cs` (TP/FP/TN/FN) | - | - | Binary Analysis |
| Changelog CVE Parsing | BinaryIndex | `DebianChangelogParser.cs` (CVE pattern extraction) | - | - | Binary Analysis |
| Secfixes Parsing | BinaryIndex | `ISecfixesParser.cs` (Alpine format) | - | - | Binary Analysis |
| Batch Binary Operations | BinaryIndex | All lookup methods support batching | - | - | Binary Analysis |
| Binary Match Confidence Scoring | BinaryIndex | 0.0-1.0 confidence for all matches | - | - | Binary Analysis |
| Architecture-Aware Filtering | BinaryIndex | Match filtering by architecture | - | - | Binary Analysis |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Alpine Corpus | BinaryIndex | No | No | Add to matrix as additional corpus |
| Corpus Ingestion UI | BinaryIndex | No | No | Consider admin UI for corpus management |
| VEX Evidence Bridge | BinaryIndex | No | No | Internal integration - document in VEX section |
| Fingerprint Visualization | BinaryIndex | Yes | No | Consider UI for function fingerprint display |
| Batch Operations | BinaryIndex | No | No | Internal API - consider batch CLI commands |
| Delta Signatures | BinaryIndex | Yes | No | Consider UI integration for patch detection |
---
## Batch 5: Advisory Sources
### Discovered Features (Not in Matrix)
**CRITICAL: Matrix lists 11 sources, but codebase has 33+ connectors!**
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| **SUSE Connector** | Concelier | `Connector.Distro.Suse/` | `stella db fetch suse` | - | Advisory Sources |
| **Astra Linux Connector** | Concelier | `Connector.Astra/` (FSTEC-certified Russian) | `stella db fetch astra` | - | Advisory Sources |
| **Microsoft MSRC** | Concelier | `vndr.msrc` vendor connector | - | - | Advisory Sources |
| **Oracle Connector** | Concelier | `vndr.oracle` vendor connector | - | - | Advisory Sources |
| **Adobe Connector** | Concelier | `vndr.adobe` vendor connector | - | - | Advisory Sources |
| **Apple Connector** | Concelier | `vndr.apple` vendor connector | - | - | Advisory Sources |
| **Cisco Connector** | Concelier | `vndr.cisco` vendor connector | - | - | Advisory Sources |
| **Chromium Connector** | Concelier | `vndr.chromium` vendor connector | - | - | Advisory Sources |
| **VMware Connector** | Concelier | `vndr.vmware` vendor connector | - | - | Advisory Sources |
| **JVN (Japan) CERT** | Concelier | `Connector.Jvn/` | - | - | Advisory Sources |
| **ACSC (Australia) CERT** | Concelier | `Connector.Acsc/` | - | - | Advisory Sources |
| **CCCS (Canada) CERT** | Concelier | `Connector.Cccs/` | - | - | Advisory Sources |
| **CertFr (France) CERT** | Concelier | `Connector.CertFr/` | - | - | Advisory Sources |
| **CertBund (Germany) CERT** | Concelier | `Connector.CertBund/` | - | - | Advisory Sources |
| **CertCc CERT** | Concelier | `Connector.CertCc/` | - | - | Advisory Sources |
| **CertIn (India) CERT** | Concelier | `Connector.CertIn/` | - | - | Advisory Sources |
| **RU-BDU (Russia) CERT** | Concelier | `Connector.Ru.Bdu/` | - | - | Advisory Sources |
| **RU-NKCKI (Russia) CERT** | Concelier | `Connector.Ru.Nkcki/` | - | - | Advisory Sources |
| **KISA (South Korea) CERT** | Concelier | `Connector.Kisa/` | - | - | Advisory Sources |
| **ICS-CISA (Industrial)** | Concelier | `Connector.Ics.Cisa/` | - | - | Advisory Sources |
| **ICS-Kaspersky (Industrial)** | Concelier | `Connector.Ics.Kaspersky/` | - | - | Advisory Sources |
| **StellaOpsMirror (Internal)** | Concelier | `Connector.StellaOpsMirror/` | - | - | Advisory Sources |
| Backport-Aware Precedence | Concelier | `ConfigurableSourcePrecedenceLattice.cs` | - | - | Advisory Sources |
| Link-Not-Merge Architecture | Concelier | Transitioning from merge to observation/linkset | - | - | Advisory Sources |
| Canonical Deduplication | Concelier | `ICanonicalAdvisoryService`, `CanonicalMerger.cs` | - | - | Advisory Sources |
| Change History Tracking | Concelier | `IChangeHistoryStore` (field-level diffs) | - | - | Advisory Sources |
| Feed Epoch Events | Concelier | `FeedEpochAdvancedEvent` (Provcache invalidation) | - | - | Advisory Sources |
| JSON Exporter | Concelier | `Exporter.Json/` (manifest-driven export) | `stella db export json` | - | Offline & Air-Gap |
| Trivy DB Exporter | Concelier | `Exporter.TrivyDb/` | `stella db export trivy` | - | Offline & Air-Gap |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| **22+ Connectors Missing from Matrix** | Concelier | Partial | No | ADD TO MATRIX - major documentation gap |
| Vendor PSIRTs (7 connectors) | Concelier | No | No | Add vendor section to matrix |
| Regional CERTs (11 connectors) | Concelier | No | No | Add regional CERT section to matrix |
| Industrial/ICS (2 connectors) | Concelier | No | No | Add ICS section to matrix |
| Link-Not-Merge Transition | Concelier | No | No | Document new architecture in matrix |
| Backport Precedence | Concelier | No | No | Document in merge engine section |
| Change History | Concelier | No | No | Consider audit trail UI |
### Matrix Update Recommendations
The FEATURE_MATRIX.md seriously underrepresents Concelier capabilities:
- **Listed:** 11 sources
- **Actual:** 33+ connectors
Recommended additions:
1. Add "Vendor PSIRTs" section (Microsoft, Oracle, Adobe, Apple, Cisco, Chromium, VMware)
2. Add "Regional CERTs" section (JVN, ACSC, CCCS, CertFr, CertBund, CertIn, RU-BDU, KISA, etc.)
3. Add "Industrial/ICS" section (ICS-CISA, ICS-Kaspersky)
4. Add "Additional Distros" section (SUSE, Astra Linux)
5. Document backport-aware precedence configuration
---
## Batch 6: VEX Processing
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| VEX Consensus Engine (5-state lattice) | VexLens | `VexConsensusEngine.cs`, `IVexConsensusEngine.cs` | `stella vex consensus` | `/vex` | VEX Processing |
| Trust Decay Service | VexLens | `TrustDecayService.cs`, `TrustDecayCalculator.cs` | - | - | VEX Processing |
| Noise Gate Service | VexLens | `NoiseGateService.cs` | - | `/vex` | VEX Processing |
| Consensus Rationale Service | VexLens | `IConsensusRationaleService.cs`, `ConsensusRationaleModels.cs` | - | `/vex` | VEX Processing |
| VEX Linkset Extraction | Excititor | `VexLinksetExtractionService.cs` | - | - | VEX Processing |
| VEX Linkset Disagreement Detection | Excititor | `VexLinksetDisagreementService.cs` | - | `/vex` | VEX Processing |
| VEX Statement Backfill | Excititor | `VexStatementBackfillService.cs` | - | - | VEX Processing |
| VEX Evidence Chunking | Excititor | `VexEvidenceChunkService.cs` | - | - | VEX Processing |
| Auto-VEX Downgrade | Excititor | `AutoVexDowngradeService.cs` | - | - | VEX Processing |
| Risk Feed Service | Excititor | `RiskFeedService.cs`, `RiskFeedEndpoints.cs` | - | - | VEX Processing |
| Trust Calibration Service | Excititor | `TrustCalibrationService.cs` | - | - | VEX Processing |
| VEX Hashing Service (deterministic) | Excititor | `VexHashingService.cs` | - | - | VEX Processing |
| CSAF Provider Connectors (7 total) | Excititor | `Connectors.*.CSAF/` (RedHat, Ubuntu, Oracle, MSRC, Cisco, SUSE) | - | - | VEX Processing |
| OCI OpenVEX Attestation Connector | Excititor | `Connectors.OCI.OpenVEX.Attest/` | - | - | VEX Processing |
| Issuer Key Lifecycle Management | IssuerDirectory | Key create/rotate/revoke endpoints | - | `/issuer-directory` | VEX Processing |
| Issuer Trust Override | IssuerDirectory | Trust override endpoints | - | `/issuer-directory` | VEX Processing |
| CSAF Publisher Bootstrap | IssuerDirectory | `csaf-publishers.json` seeding | - | - | VEX Processing |
| VEX Webhook Distribution | VexHub | `IWebhookService.cs`, `IWebhookSubscriptionRepository.cs` | - | - | VEX Processing |
| VEX Conflict Flagging | VexHub | `IStatementFlaggingService.cs` | - | - | VEX Processing |
| VEX from Drift Generation | CLI | `VexGenCommandGroup.cs` | `stella vex gen --from-drift` | - | VEX Processing |
| VEX Decision Signing | Policy | `VexDecisionSigningService.cs` | - | - | Policy Engine |
| VEX Proof Spine | Policy | `VexProofSpineService.cs` | - | - | Policy Engine |
| Consensus Propagation Rules | VexLens | `IPropagationRuleEngine.cs` | - | - | VEX Processing |
| Consensus Delta Computation | VexLens | `VexDeltaComputeService.cs` | - | - | VEX Processing |
| Triple-Layer Consensus Storage | VexLens | Cache->DB->Archive with `IConsensusProjectionStore.cs` | - | - | Operations |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| CSAF Provider Connectors | Excititor | No | No | Consider connector status UI in ops |
| Trust Weight Configuration | VexLens | No | Partial | Add `stella vex trust configure` command |
| VEX Distribution Webhooks | VexHub | No | No | Add webhook management UI/CLI |
| Conflict Resolution | VexLens | No | Partial | Interactive conflict resolution needed |
| Issuer Key Management | IssuerDirectory | No | Yes | Add `stella issuer keys` CLI |
| Risk Feed Distribution | Excititor | No | No | Consider risk feed CLI |
| Consensus Replay/Verify | VexLens | No | No | Add `stella vex verify` command |
| VEX Evidence Export | Excititor | No | No | Add `stella vex evidence export` |
### Matrix Update Recommendations
The FEATURE_MATRIX.md VEX section is significantly underspecified:
- **Listed:** Basic VEX support (OpenVEX, CSAF, CycloneDX)
- **Actual:** Full consensus engine with 5-state lattice, 9 trust factors, 7 CSAF connectors, conflict detection, issuer registry
Recommended additions:
1. Add "VEX Consensus Engine" as major feature (VexLens)
2. Add "Trust Weight Scoring" with 9 factors documented
3. Add "CSAF Provider Connectors" section (7 vendors)
4. Add "Issuer Trust Registry" (IssuerDirectory)
5. Add "VEX Distribution" (VexHub webhooks)
6. Document AOC (Aggregation-Only Contract) compliance
7. Add "VEX from Drift" generation capability
---
## Batch 7: Policy Engine
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| K4 Lattice (Belnap Four-Valued Logic) | Policy | `K4Lattice.cs`, `TrustLatticeEngine.cs`, `ClaimScoreMerger.cs` | - | `/policy` | Policy Engine |
| 10+ Policy Gate Types | Policy | `PolicyGateEvaluator.cs`, various *Gate.cs files | - | `/policy` | Policy Engine |
| Uncertainty Score Calculator | Policy.Determinization | `UncertaintyScoreCalculator.cs` (entropy 0.0-1.0) | - | - | Policy Engine |
| Decayed Confidence Calculator | Policy.Determinization | `DecayedConfidenceCalculator.cs` (14-day half-life) | - | - | Policy Engine |
| 6 Evidence Types | Policy.Determinization | `BackportEvidence.cs`, `CvssEvidence.cs`, `EpssEvidence.cs`, etc. | - | - | Policy Engine |
| 6 Risk Score Providers | RiskEngine | `CvssKevProvider.cs`, `EpssProvider.cs`, `FixChainRiskProvider.cs` | - | `/risk` | Scoring & Risk |
| FixChain Risk Metrics | RiskEngine | `FixChainRiskMetrics.cs`, `FixChainRiskDisplay.cs` | - | - | Scoring & Risk |
| Exception Effect Registry | Policy | `ExceptionEffectRegistry.cs`, `ExceptionAdapter.cs` | - | `/policy/exceptions` | Policy Engine |
| Exception Approval Rules | Policy | `IExceptionApprovalRulesService.cs` | - | `/policy/exceptions` | Policy Engine |
| Policy Simulation Service | Policy.Registry | `IPolicySimulationService.cs` | `stella policy simulate` | `/policy/simulate` | Policy Engine |
| Policy Promotion Pipeline | Policy.Registry | `IPromotionService.cs`, `IPublishPipelineService.cs` | - | - | Policy Engine |
| Review Workflow Service | Policy.Registry | `IReviewWorkflowService.cs` | - | - | Policy Engine |
| Sealed Mode Service | Policy | `ISealedModeService.cs` | - | `/ops` | Offline & Air-Gap |
| Verdict Attestation Service | Policy | `IVerdictAttestationService.cs` | - | - | Attestation & Signing |
| Policy Decision Attestation | Policy | `IPolicyDecisionAttestationService.cs` (DSSE/Rekor) | - | - | Attestation & Signing |
| Score Policy YAML Config | Policy | `ScorePolicyModels.cs`, `ScorePolicyLoader.cs` | `stella policy validate` | `/policy` | Policy Engine |
| Profile-Aware Scoring | Policy.Scoring | `ProfileAwareScoringService.cs`, `ScoringProfileService.cs` | - | - | Policy Engine |
| Freshness-Aware Scoring | Policy | `FreshnessAwareScoringService.cs` | - | - | Policy Engine |
| Jurisdiction Trust Rules | Policy.Vex | `JurisdictionTrustRules.cs` | - | - | Policy Engine |
| VEX Customer Override | Policy.Vex | `VexCustomerOverride.cs` | - | - | Policy Engine |
| Attestation Report Service | Policy | `IAttestationReportService.cs` | - | - | Attestation & Signing |
| Risk Scoring Trigger Service | Policy.Scoring | `RiskScoringTriggerService.cs` | - | - | Scoring & Risk |
| Policy Lint Endpoint | Policy | `/policy/lint` | - | - | Policy Engine |
| Policy Determinism Verification | Policy | `/policy/verify-determinism` | - | - | Determinism & Reproducibility |
| AdvisoryAI Knobs Endpoint | Policy | `/policy/advisory-ai/knobs` | - | - | Policy Engine |
| Stability Damping Gate | Policy | `StabilityDampingGate.cs` | - | - | Policy Engine |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| K4 Lattice Operations | Policy | No | Partial | Add `stella policy lattice explain` for debugging |
| Risk Provider Configuration | RiskEngine | No | No | Provider configuration needs CLI/UI exposure |
| Exception Approval Workflow | Policy | No | Yes | Add `stella policy exception approve/reject` CLI |
| Determinization Signal Weights | Policy | No | No | Allow signal weight tuning via CLI/config |
| Policy Pack Promotion | Policy.Registry | No | Partial | Add `stella policy promote` CLI |
| Score Policy Tuning | Policy.Scoring | Partial | Partial | Expand `stella policy` commands |
| Verdict Attestation Export | Policy | No | No | Add `stella policy verdicts export` |
| Risk Scoring History | RiskEngine | No | Partial | Consider historical trend CLI |
### Matrix Update Recommendations
The FEATURE_MATRIX.md Policy section covers basics but misses advanced features:
- **Listed:** Basic policy evaluation, exceptions
- **Actual:** Full K4 lattice, 10+ gate types, 6 risk providers, determinization system
Recommended additions:
1. Add "K4 Lattice Logic" as core feature (Belnap four-valued logic)
2. Add "Policy Gate Types" section (10+ specialized gates)
3. Add "Risk Score Providers" section (6 providers with distinct purposes)
4. Add "Determinization System" (signal weights, decay, uncertainty)
5. Add "Score Policy Configuration" (YAML-based policy tuning)
6. Add "Policy Simulation" as distinct feature
7. Add "Verdict Attestations" (DSSE/Rekor integration)
8. Document "Sealed Mode" for air-gap operations
---
## Batch 8: Attestation & Signing
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| 25+ Predicate Types | Attestor | `StellaOps.Attestor.ProofChain/Predicates/` | - | - | Attestation & Signing |
| Keyless Signing (Fulcio) | Signer | `KeylessDsseSigner.cs`, `HttpFulcioClient.cs` | `stella sign keyless` | - | Attestation & Signing |
| Ephemeral Key Generation | Signer.Keyless | `EphemeralKeyGenerator.cs`, `EphemeralKeyPair.cs` | - | - | Attestation & Signing |
| OIDC Token Provider | Signer.Keyless | `IOidcTokenProvider.cs`, `AmbientOidcTokenProvider.cs` | - | - | Attestation & Signing |
| Key Rotation Service | Signer.KeyManagement | `IKeyRotationService.cs`, `KeyRotationService.cs` | `/keys/rotate` API | - | Attestation & Signing |
| Trust Anchor Manager | Signer.KeyManagement | `ITrustAnchorManager.cs`, `TrustAnchorManager.cs` | - | - | Attestation & Signing |
| Delta Attestations (4 types) | Attestor | `IDeltaAttestationService.cs` (VEX/SBOM/Verdict/Reachability) | - | - | Attestation & Signing |
| Layer Attestation Service | Attestor | `ILayerAttestationService.cs` | - | - | Attestation & Signing |
| Attestation Chain Builder | Attestor | `AttestationChainBuilder.cs`, `AttestationChainValidator.cs` | - | - | Attestation & Signing |
| Attestation Link Store | Attestor | `IAttestationLinkStore.cs`, `IAttestationLinkResolver.cs` | - | - | Attestation & Signing |
| Rekor Submission Queue | Attestor | `IRekorSubmissionQueue.cs` (durable retry) | - | - | Attestation & Signing |
| Cached Verification Service | Attestor | `CachedAttestorVerificationService.cs` | - | - | Attestation & Signing |
| Offline Bundle Service | Attestor | `IAttestorBundleService.cs` | - | `/ops/offline-kit` | Offline & Air-Gap |
| Signer Quota Service | Signer | `ISignerQuotaService.cs` | - | - | Operations |
| Signer Audit Sink | Signer | `ISignerAuditSink.cs`, `InMemorySignerAuditSink.cs` | - | - | Operations |
| Proof of Entitlement | Signer | `IProofOfEntitlementIntrospector.cs` (JWT/MTLS) | - | - | Auth & Access Control |
| Release Integrity Verifier | Signer | `IReleaseIntegrityVerifier.cs` | - | - | Attestation & Signing |
| JSON Canonicalizer (RFC 8785) | Attestor | `JsonCanonicalizer.cs` | - | - | Determinism & Reproducibility |
| Predicate Type Router | Attestor | `IPredicateTypeRouter.cs`, `PredicateTypeRouter.cs` | - | - | Attestation & Signing |
| Standard Predicate Registry | Attestor | `IStandardPredicateRegistry.cs` | - | - | Attestation & Signing |
| HMAC Signing | Signer | `HmacDsseSigner.cs` | - | - | Attestation & Signing |
| SM2 Algorithm Support | Signer | `CryptoDsseSigner.cs` (SM2 branch) | - | - | Regional Crypto |
| Promotion Attestation | Provenance | `PromotionAttestation.cs` | - | - | Release Orchestration |
| Cosign/KMS Signer | Provenance | `CosignAndKmsSigner.cs` | - | - | Attestation & Signing |
| Rotating Signer | Provenance | `RotatingSigner.cs` | - | - | Attestation & Signing |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Key Rotation | Signer | No | No | Add `stella keys rotate` CLI command |
| Trust Anchor Management | Signer | No | No | Add `stella trust-anchors` commands |
| Attestation Chain Visualization | Attestor | No | Partial | Add chain visualization UI |
| Predicate Registry Browser | Attestor | No | No | Add `stella attest predicates list` |
| Delta Attestation CLI | Attestor | No | No | Add `stella attest delta` commands |
| Signer Audit Logs | Signer | No | No | Add `stella sign audit` command |
| Rekor Submission Status | Attestor | No | No | Add submission queue status UI |
### Matrix Update Recommendations
The FEATURE_MATRIX.md Attestation section lists basic DSSE/in-toto support:
- **Listed:** Basic attestation attach/verify, SLSA provenance
- **Actual:** 25+ predicate types, keyless signing, key rotation, attestation chains
Recommended additions:
1. Add "Predicate Types" section (25+ types documented)
2. Add "Keyless Signing (Sigstore)" as major feature
3. Add "Key Rotation Service" for Enterprise tier
4. Add "Trust Anchor Management" for Enterprise tier
5. Add "Attestation Chains" feature
6. Add "Delta Attestations" (VEX/SBOM/Verdict/Reachability)
7. Document "Offline Bundle Service" for air-gap
8. Add "SM2 Algorithm Support" in Regional Crypto section
---
## Batch 9: Regional Crypto
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| 8 Signature Profiles | Cryptography | `SignatureProfile.cs` | - | - | Regional Crypto |
| Ed25519 Baseline Signing | Cryptography | `Ed25519Signer.cs`, `Ed25519Verifier.cs` | - | - | Regional Crypto |
| ECDSA P-256 Profile | Cryptography | `EcdsaP256Signer.cs` | - | - | Regional Crypto |
| FIPS 140-2 Plugin | Cryptography | `FipsPlugin.cs` | - | - | Regional Crypto |
| GOST R 34.10-2012 Plugin | Cryptography | `GostPlugin.cs` | - | - | Regional Crypto |
| SM2/SM3/SM4 Plugin | Cryptography | `SmPlugin.cs` | - | - | Regional Crypto |
| eIDAS Plugin (CAdES/XAdES) | Cryptography | `EidasPlugin.cs` | - | - | Regional Crypto |
| HSM Plugin (PKCS#11) | Cryptography | `HsmPlugin.cs` (simulated + production) | - | - | Regional Crypto |
| CryptoPro GOST (Windows) | Cryptography | `CryptoProGostCryptoProvider.cs` | - | - | Regional Crypto |
| Multi-Profile Signing | Cryptography | `MultiProfileSigner.cs` | - | - | Regional Crypto |
| SM Remote Service | SmRemote | `Program.cs` | - | - | Regional Crypto |
| Post-Quantum Profiles (Defined) | Cryptography | `SignatureProfile.cs` (Dilithium, Falcon) | - | - | Regional Crypto |
| RFC 3161 TSA Integration | Cryptography | `EidasPlugin.cs` | - | - | Regional Crypto |
| Simulated HSM Client | Cryptography | `SimulatedHsmClient.cs` | - | - | Regional Crypto |
| GOST Block Cipher (28147-89) | Cryptography | `GostPlugin.cs` | - | - | Regional Crypto |
| SM4 Encryption (CBC/ECB/GCM) | Cryptography | `SmPlugin.cs` | - | - | Regional Crypto |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Crypto Profile Selection | Cryptography | No | No | Add `stella crypto profiles` command |
| Plugin Health Check | Cryptography | No | No | Add plugin status endpoint |
| Key Management CLI | Cryptography | No | No | Add `stella keys` commands |
| HSM Status | Cryptography | No | No | Add HSM health monitoring |
| Post-Quantum Implementation | Cryptography | No | No | Implement Dilithium/Falcon when stable |
### Matrix Update Recommendations
The FEATURE_MATRIX.md Regional Crypto section mentions only FIPS/eIDAS/GOST:
- **Listed:** Basic regional compliance mentions
- **Actual:** 8 signature profiles, 6 plugins, HSM support, post-quantum readiness
Recommended additions:
1. Add "Signature Profiles" section (8 profiles documented)
2. Add "Plugin Architecture" description
3. Add "Multi-Profile Signing" capability (dual-stack signatures)
4. Add "SM Remote Service" for Chinese market
5. Add "Post-Quantum Readiness" (Dilithium, Falcon defined)
6. Add "HSM Integration" (PKCS#11 + simulation)
7. Document plugin configuration options
8. Add "CryptoPro GOST" for Windows environments
---
## Batch 10: Evidence & Findings
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| WORM Storage (S3 Object Lock) | EvidenceLocker | `S3EvidenceObjectStore.cs` | - | - | Evidence & Findings |
| Verdict Attestations (DSSE) | EvidenceLocker | `VerdictEndpoints.cs`, `VerdictContracts.cs` | - | `/evidence-export` | Evidence & Findings |
| Append-Only Ledger Events | Findings | `ILedgerEventRepository.cs`, `LedgerEventModels.cs` | - | `/findings` | Evidence & Findings |
| Alert Triage Bands (hot/warm/cold) | Findings | `DecisionModels.cs` | - | `/findings` | Evidence & Findings |
| Merkle Anchoring | Findings | `Infrastructure/Merkle/` | - | - | Evidence & Findings |
| Evidence Holds (Legal) | EvidenceLocker | `EvidenceHold.cs` | - | - | Evidence & Findings |
| Evidence Pack Service | Evidence.Pack | `IEvidencePackService.cs`, `EvidencePack.cs` | - | `/evidence-thread` | Evidence & Findings |
| Evidence Card Service | Evidence.Pack | `IEvidenceCardService.cs`, `EvidenceCard.cs` | - | - | Evidence & Findings |
| Profile-Based Export | ExportCenter | `ExportApiEndpoints.cs`, `ExportProfile` | - | `/evidence-export` | Evidence & Findings |
| Risk Bundle Export | ExportCenter | `RiskBundleEndpoints.cs` | - | `/evidence-export` | Evidence & Findings |
| Audit Bundle Export | ExportCenter | `AuditBundleEndpoints.cs` | - | - | Evidence & Findings |
| Lineage Evidence Export | ExportCenter | `LineageExportEndpoints.cs` | - | `/lineage` | Evidence & Findings |
| SSE Export Streaming | ExportCenter | Real-time run events | - | - | Evidence & Findings |
| Incident Mode | Findings | `IIncidentModeState.cs` | - | - | Evidence & Findings |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Evidence Holds | EvidenceLocker | No | No | Add legal hold management CLI |
| Audit Bundle Export | ExportCenter | No | Partial | Add `stella export audit` command |
| Incident Mode | Findings | No | No | Add `stella findings incident` commands |
---
## Batch 11: Determinism & Replay
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| Hybrid Logical Clock | HybridLogicalClock | `HybridLogicalClock.cs`, `HlcTimestamp.cs` | - | - | Determinism & Replay |
| HLC State Persistence | HybridLogicalClock | `IHlcStateStore.cs` | - | - | Determinism & Replay |
| Canonical JSON (RFC 8785) | Canonical.Json | `CanonJson.cs`, `CanonVersion.cs` | - | - | Determinism & Replay |
| Replay Manifests V1/V2 | Replay.Core | `ReplayManifest.cs` | `stella scan replay` | - | Determinism & Replay |
| Knowledge Snapshots | Replay.Core | `KnowledgeSnapshot.cs` | - | - | Determinism & Replay |
| Replay Proofs (DSSE) | Replay.Core | `ReplayProof.cs` | `stella prove` | - | Determinism & Replay |
| Evidence Weighted Scoring (6 factors) | Signals | `EvidenceWeightedScoreCalculator.cs` | - | - | Scoring & Risk |
| Score Buckets (ActNow/ScheduleNext/Investigate/Watchlist) | Signals | Scoring algorithm | - | - | Scoring & Risk |
| Attested Reduction (short-circuit) | Signals | VEX anchoring logic | - | - | Scoring & Risk |
| Timeline Events | Eventing | `TimelineEvent.cs`, `ITimelineEventEmitter.cs` | - | - | Determinism & Replay |
| Deterministic Event IDs | Eventing | `EventIdGenerator.cs` (SHA-256) | - | - | Determinism & Replay |
| Transactional Outbox | Eventing | `TimelineOutboxProcessor.cs` | - | - | Determinism & Replay |
| Event Signing (DSSE) | Eventing | `IEventSigner.cs` | - | - | Determinism & Replay |
| Replay Bundle Writer | Replay.Core | `StellaReplayBundleWriter.cs` (tar.zst) | - | - | Determinism & Replay |
| Dead Letter Replay | Orchestrator | `IReplayManager.cs`, `ReplayManager.cs` | - | - | Operations |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| HLC Inspection | HybridLogicalClock | No | No | Add `stella hlc status` command |
| Timeline Events | Eventing | No | No | Add `stella timeline query` command |
| Scoring Explanation | Signals | No | No | Add `stella score explain` command |
---
## Batch 12: Operations
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| Impact Index (Roaring bitmaps) | Scheduler | `IImpactIndex.cs` | - | - | Operations |
| Graph Build/Overlay Jobs | Scheduler | `IGraphJobService.cs` | - | `/ops/scheduler` | Operations |
| Run Preview (dry-run) | Scheduler | `RunEndpoints.cs` | - | - | Operations |
| SSE Run Streaming | Scheduler | `/runs/{runId}/stream` | - | - | Operations |
| Job Repository | Orchestrator | `IJobRepository.cs`, `Job.cs` | - | `/orchestrator` | Operations |
| Lease Management | Orchestrator | `LeaseNextAsync()`, `ExtendLeaseAsync()` | - | - | Operations |
| Dead Letter Classification | Orchestrator | `DeadLetterEntry.cs` | - | `/orchestrator` | Operations |
| First Signal Service | Orchestrator | `IFirstSignalService.cs` | - | - | Operations |
| Task Pack Execution | TaskRunner | `ITaskRunnerClient.cs` | - | - | Operations |
| Plan-Hash Binding | TaskRunner | Deterministic validation | - | - | Operations |
| Approval Gates | TaskRunner | `ApprovalDecisionRequest.cs` | - | - | Operations |
| Artifact Capture | TaskRunner | Digest tracking | - | - | Operations |
| Timeline Query Service | TimelineIndexer | `ITimelineQueryService.cs` | - | - | Operations |
| Timeline Ingestion | TimelineIndexer | `ITimelineIngestionService.cs` | - | - | Operations |
| Token-Bucket Rate Limiting | Orchestrator | Adaptive refill per tenant | - | - | Operations |
| Job Watermarks | Orchestrator | Ordering guarantees | - | - | Operations |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Impact Preview | Scheduler | No | Partial | Add `stella scheduler preview` command |
| Job Management | Orchestrator | No | Yes | Add `stella orchestrator jobs` commands |
| Dead Letter Operations | Orchestrator | No | Yes | Add `stella orchestrator deadletter` commands |
| TaskRunner CLI | TaskRunner | No | No | Add `stella taskrunner` commands |
| Timeline Query CLI | TimelineIndexer | No | No | Add `stella timeline` commands |
---
## Batch 13: Release Orchestration
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| Environment Bundles | ReleaseOrchestrator | `IEnvironmentBundleService.cs`, `EnvironmentBundle.cs` | - | `/releases` | Release Orchestration |
| Promotion Workflows | ReleaseOrchestrator | `IPromotionWorkflowService.cs`, `PromotionRequest.cs` | - | `/releases` | Release Orchestration |
| Rollback Service | ReleaseOrchestrator | `IRollbackService.cs`, `RollbackRequest.cs` | - | `/releases` | Release Orchestration |
| Deployment Agents (Docker/Compose/ECS/Nomad) | ReleaseOrchestrator | `IDeploymentAgent.cs`, various agent implementations | - | `/releases` | Release Orchestration |
| Progressive Delivery (A/B, Canary) | ReleaseOrchestrator | `IProgressiveDeliveryService.cs` | - | `/releases` | Release Orchestration |
| Hook System (Pre/Post Deploy) | ReleaseOrchestrator | `IHookExecutionService.cs`, `Hook.cs` | - | `/releases` | Release Orchestration |
| Approval Gates (Multi-Stage) | ReleaseOrchestrator | `IApprovalGateService.cs`, `ApprovalGate.cs` | - | `/releases` | Release Orchestration |
| Release Bundle Signing | ReleaseOrchestrator | `IReleaseBundleSigningService.cs` | - | - | Release Orchestration |
| Environment Promotion History | ReleaseOrchestrator | `IPromotionHistoryService.cs` | - | `/releases` | Release Orchestration |
| Deployment Lock Service | ReleaseOrchestrator | `IDeploymentLockService.cs` | - | - | Release Orchestration |
| Release Manifest Generation | ReleaseOrchestrator | `IReleaseManifestService.cs` | - | - | Release Orchestration |
| Promotion Attestations | ReleaseOrchestrator | `PromotionAttestation.cs` | - | - | Attestation & Signing |
| Environment Health Checks | ReleaseOrchestrator | `IEnvironmentHealthService.cs` | - | `/releases` | Release Orchestration |
| Deployment Verification Tests | ReleaseOrchestrator | `IVerificationTestService.cs` | - | - | Release Orchestration |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Release Bundle Creation | ReleaseOrchestrator | No | Partial | Add `stella release create` command |
| Environment Promotion | ReleaseOrchestrator | No | Yes | Add `stella release promote` command |
| Rollback Operations | ReleaseOrchestrator | No | Yes | Add `stella release rollback` command |
| Hook Management | ReleaseOrchestrator | No | Partial | Add `stella release hooks` commands |
| Deployment Agent Status | ReleaseOrchestrator | No | Partial | Add `stella agent status` command |
### Matrix Update Recommendations
The FEATURE_MATRIX.md Release Orchestration section is largely planned:
- **Listed:** Basic environment management concepts
- **Actual:** Full promotion workflow, deployment agents, progressive delivery
Recommended additions:
1. Add "Deployment Agents" section (Docker, Compose, ECS, Nomad)
2. Add "Progressive Delivery" (A/B, Canary strategies)
3. Add "Approval Gates" (multi-stage approvals)
4. Add "Hook System" (pre/post deployment hooks)
5. Add "Promotion Attestations" (DSSE signing of promotions)
6. Document "Environment Health Checks"
---
## Batch 14: Auth & Access Control
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| 75+ Authorization Scopes | Authority | `AuthorizationScopeConstants.cs` | - | `/admin/roles` | Auth & Access Control |
| DPoP Sender Constraints | Authority | `DPoPService.cs`, `DPoPValidator.cs` | - | - | Auth & Access Control |
| mTLS Sender Constraints | Authority | `MtlsClientCertificateValidator.cs` | - | - | Auth & Access Control |
| Device Authorization Flow | Authority | `DeviceAuthorizationEndpoints.cs` | - | `/login` | Auth & Access Control |
| JWT Profile for OAuth | Authority | `JwtBearerClientAssertionValidator.cs` | - | - | Auth & Access Control |
| PAR (Pushed Authorization Requests) | Authority | `ParEndpoints.cs` | - | - | Auth & Access Control |
| Tenant Isolation | Authority | `ITenantContext.cs`, `TenantResolutionMiddleware.cs` | - | - | Auth & Access Control |
| Role-Based Access Control | Authority | `IRoleService.cs`, `Role.cs` | - | `/admin/roles` | Auth & Access Control |
| Permission Grant Service | Authority | `IPermissionGrantService.cs` | - | - | Auth & Access Control |
| Token Introspection | Authority | `TokenIntrospectionEndpoints.cs` | - | - | Auth & Access Control |
| Token Revocation | Authority | `TokenRevocationEndpoints.cs` | - | - | Auth & Access Control |
| OAuth Client Management | Authority | `IClientRepository.cs`, `Client.cs` | - | `/admin/clients` | Auth & Access Control |
| User Federation (LDAP/SAML) | Authority | `IFederationProvider.cs` | - | `/admin/federation` | Auth & Access Control |
| Session Management | Authority | `ISessionStore.cs`, `Session.cs` | - | - | Auth & Access Control |
| Consent Management | Authority | `IConsentStore.cs`, `Consent.cs` | - | `/consent` | Auth & Access Control |
| Registry Token Service | Registry | `ITokenService.cs`, `TokenModels.cs` | `stella registry login` | - | Auth & Access Control |
| Scope-Based Token Minting | Registry | Pull/push/catalog scope handling | - | - | Auth & Access Control |
| Token Refresh Flow | Authority | Refresh token rotation | - | - | Auth & Access Control |
| Multi-Factor Authentication | Authority | `IMfaService.cs` | - | `/login/mfa` | Auth & Access Control |
| API Key Management | Authority | `IApiKeyService.cs` | - | `/admin/api-keys` | Auth & Access Control |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Scope Management | Authority | No | Yes | Add `stella auth scopes` commands |
| DPoP Configuration | Authority | No | No | Add DPoP configuration documentation |
| Client Management | Authority | No | Yes | Add `stella auth clients` commands |
| Role Management | Authority | No | Yes | Add `stella auth roles` commands |
| API Key Operations | Authority | No | Yes | Add `stella auth api-keys` commands |
| Token Introspection | Authority | No | No | Add `stella auth token inspect` command |
### Matrix Update Recommendations
The FEATURE_MATRIX.md Auth section covers basics but misses advanced features:
- **Listed:** Basic OAuth/OIDC, RBAC
- **Actual:** 75+ scopes, DPoP/mTLS, federation, advanced OAuth flows
Recommended additions:
1. Add "Authorization Scopes" section (75+ granular scopes)
2. Add "Sender Constraints" (DPoP, mTLS)
3. Add "Device Authorization Flow" for CLI/IoT
4. Add "User Federation" (LDAP, SAML integration)
5. Add "PAR Support" for security-conscious clients
6. Add "Multi-Factor Authentication"
7. Add "API Key Management" for service accounts
8. Document "Tenant Isolation" architecture
---
## Batch 15: Notifications & Integrations
### Discovered Features (Not in Matrix)
| Feature | Module | Key Files | CLI | UI | Suggested Category |
|---------|--------|-----------|-----|----|--------------------|
| 10 Notification Channel Types | Notify | Email, Slack, Teams, Webhook, PagerDuty, SNS, SQS, Pub/Sub, Discord, Matrix | - | `/notifications` | Notifications |
| Template-Based Notifications | Notify | `INotificationTemplateService.cs`, `NotificationTemplate.cs` | - | `/notifications` | Notifications |
| Channel Routing Rules | Notify | `IChannelRoutingService.cs`, `RoutingRule.cs` | - | `/notifications` | Notifications |
| Delivery Receipt Tracking | Notify | `IDeliveryReceiptService.cs`, `DeliveryReceipt.cs` | - | - | Notifications |
| Notification Preferences | Notify | `IPreferenceService.cs`, `UserPreference.cs` | - | `/settings` | Notifications |
| Digest/Batch Notifications | Notify | `IDigestService.cs` | - | `/notifications` | Notifications |
| Kubernetes Admission Webhooks | Zastava | `AdmissionWebhookEndpoints.cs` | - | - | Integrations |
| OCI Registry Push Hooks | Zastava | `IWebhookProcessor.cs`, `RegistryPushEvent.cs` | - | - | Integrations |
| Scan-on-Push Trigger | Zastava | Auto-trigger scanning on registry push | - | - | Integrations |
| SCM Webhooks (GitHub/GitLab/Bitbucket) | Integrations | `IScmWebhookHandler.cs` | - | `/integrations` | Integrations |
| CI/CD Webhooks | Integrations | Jenkins, CircleCI, GitHub Actions integration | - | `/integrations` | Integrations |
| Issue Tracker Integration | Integrations | Jira, GitHub Issues, Linear integration | - | `/integrations` | Integrations |
| Slack App Integration | Integrations | `ISlackAppService.cs`, slash commands | - | `/integrations` | Integrations |
| MS Teams App Integration | Integrations | `ITeamsAppService.cs`, adaptive cards | - | `/integrations` | Integrations |
| Notification Studio | Notifier | Template design and preview | - | `/notifications/studio` | Notifications |
| Escalation Rules | Notify | `IEscalationService.cs` | - | `/notifications` | Notifications |
| On-Call Schedule Integration | Notify | PagerDuty, OpsGenie integration | - | `/notifications` | Notifications |
| Webhook Retry Logic | Notify | Exponential backoff, dead letter | - | - | Notifications |
| Event-Driven Notifications | Notify | Timeline event subscription | - | - | Notifications |
| Custom Webhook Payloads | Integrations | `IWebhookPayloadFormatter.cs` | - | `/integrations` | Integrations |
### Coverage Gaps
| Feature | Module | Has CLI | Has UI | Recommendation |
|---------|--------|---------|--------|----------------|
| Channel Configuration | Notify | No | Yes | Add `stella notify channels` commands |
| Template Management | Notify | No | Yes | Add `stella notify templates` commands |
| Webhook Testing | Integrations | No | Partial | Add `stella integrations test` command |
| K8s Webhook Installation | Zastava | No | No | Add `stella zastava install` command |
| Notification Preferences | Notify | No | Yes | Add `stella notify preferences` commands |
### Matrix Update Recommendations
The FEATURE_MATRIX.md Notifications section is basic:
- **Listed:** Basic webhook/email notifications
- **Actual:** 10 channel types, template engine, routing rules, escalation
Recommended additions:
1. Add "Notification Channels" section (10 types)
2. Add "Template Engine" for customizable messages
3. Add "Channel Routing" for sophisticated delivery
4. Add "Escalation Rules" for incident response
5. Add "Notification Studio" for template design
6. Add "Kubernetes Admission Webhooks" (Zastava)
7. Add "SCM Integrations" (GitHub, GitLab, Bitbucket)
8. Add "CI/CD Integrations" (Jenkins, CircleCI, GitHub Actions)
9. Add "Issue Tracker Integration" (Jira, GitHub Issues)
10. Document "Scan-on-Push" auto-trigger
---
## Summary: Overall Matrix Gaps
### Major Documentation Gaps Identified
| Category | Matrix Coverage | Actual Coverage | Gap Severity |
|----------|-----------------|-----------------|--------------|
| Advisory Sources | 11 sources | 33+ connectors | **CRITICAL** |
| VEX Processing | Basic | Full consensus engine | **HIGH** |
| Attestation & Signing | Basic | 25+ predicates | **HIGH** |
| Auth Scopes | Basic RBAC | 75+ granular scopes | **HIGH** |
| Policy Engine | Basic | K4 lattice, 10+ gates | **MEDIUM** |
| Regional Crypto | 3 profiles | 8 profiles, 6 plugins | **MEDIUM** |
| Notifications | 2 channels | 10 channels | **MEDIUM** |
| Binary Analysis | Basic | 4 fingerprint algorithms | **MEDIUM** |
| Release Orchestration | Planned | Partially implemented | **LOW** |
### CLI/UI Coverage Statistics
| Metric | Value |
|--------|-------|
| Features with CLI | ~65% |
| Features with UI | ~70% |
| Features with both | ~55% |
| Internal-only features | ~25% |
### Recommended Next Steps
1. **Immediate**: Update Advisory Sources section (33+ connectors undocumented)
2. **High Priority**: Document VEX consensus engine capabilities
3. **High Priority**: Document attestation predicate types
4. **Medium Priority**: Update auth scopes documentation
5. **Medium Priority**: Complete policy engine documentation
6. **Low Priority**: Document internal operations features

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,230 @@
# Agent Operations Quick Start
This guide covers deploying, configuring, and maintaining Stella Ops agents at scale.
## Zero-Touch Bootstrap
Deploy agents with a single command using bootstrap tokens.
### Generate Bootstrap Token
```bash
# Generate token and get install command
stella agent bootstrap --name prod-agent-01 --env production
# Output includes platform-specific one-liners:
# Linux: curl -fsSL https://... | STELLA_TOKEN="..." bash
# Windows: $env:STELLA_TOKEN='...'; iwr -useb https://... | iex
# Docker: docker run -d -e STELLA_TOKEN="..." stellaops/agent:latest
```
### Custom Capabilities
```bash
stella agent bootstrap \
--name prod-agent-01 \
--env production \
--capabilities docker,compose,helm \
--output install-token.txt
```
## Configuration Management
### View Current Configuration
```bash
# Show current config in YAML format
stella agent config
# Show as JSON
stella agent config --format json
```
### Detect Configuration Drift
```bash
# Check for drift between current and desired state
stella agent config --diff
```
### Apply New Configuration
```yaml
# agent-config.yaml
identity:
agentId: agent-abc123
agentName: prod-agent-01
environment: production
connection:
orchestratorUrl: https://orchestrator.example.com
heartbeatInterval: 30s
capabilities:
docker: true
scripts: true
compose: true
resources:
maxConcurrentTasks: 10
workDirectory: /var/lib/stella-agent
security:
certificate:
source: AutoProvision
```
```bash
# Validate without applying
stella agent apply -f agent-config.yaml --dry-run
# Apply configuration
stella agent apply -f agent-config.yaml
```
## Agent Health Diagnostics (Doctor)
### Run Local Diagnostics
```bash
# Run all health checks
stella agent doctor
# Filter by category
stella agent doctor --category security
stella agent doctor --category network
stella agent doctor --category runtime
stella agent doctor --category resources
stella agent doctor --category configuration
```
### Apply Automated Fixes
```bash
# Run diagnostics and apply fixes
stella agent doctor --fix
```
### Output Formats
```bash
# Table output (default)
stella agent doctor
# JSON output for scripting
stella agent doctor --format json
# YAML output
stella agent doctor --format yaml
```
## Certificate Management
### Check Certificate Status
```bash
stella agent cert-status
```
### Renew Certificate
```bash
# Renew if nearing expiry
stella agent renew-cert
# Force renewal
stella agent renew-cert --force
```
## Agent Updates
### Check for Updates
```bash
stella agent update --check
```
### Apply Updates
```bash
# Update to latest
stella agent update
# Update to specific version
stella agent update --version 1.3.0
# Force update outside maintenance window
stella agent update --force
```
### Rollback
```bash
# Rollback to previous version
stella agent rollback
```
## Health Check Categories
| Category | Checks |
|----------|--------|
| Security | Certificate expiry, certificate validity |
| Network | Orchestrator connectivity, DNS resolution |
| Runtime | Docker daemon, task queue depth |
| Resources | Disk space, memory usage, CPU usage |
| Configuration | Configuration drift |
## Troubleshooting
### Common Issues
**Certificate Expired**
```bash
stella agent renew-cert --force
```
**Docker Not Accessible**
```bash
# Check Docker socket
ls -la /var/run/docker.sock
# Add agent to docker group
sudo usermod -aG docker stella-agent
sudo systemctl restart stella-agent
```
**Disk Space Low**
```bash
# Clean up Docker resources
docker system prune -af --volumes
# Check agent work directory
du -sh /var/lib/stella-agent
```
**Connection Issues**
```bash
# Check DNS
nslookup orchestrator.example.com
# Check port
telnet orchestrator.example.com 443
# Check firewall
sudo iptables -L -n | grep 443
```
## Fleet Monitoring
The orchestrator Doctor plugin monitors all agents:
- **Heartbeat Freshness**: Alerts on stale heartbeats
- **Certificate Expiry**: Warns before fleet certificates expire
- **Version Consistency**: Detects version skew across agents
- **Capacity**: Monitors task queue and agent load
- **Failed Task Rate**: Alerts on high failure rates
Access via:
```bash
stella doctor run --plugin agent-health
```

View File

@@ -1,188 +0,0 @@
# Sprint 026 · CLI Why-Blocked Command
## Topic & Scope
- Implement `stella explain block <digest>` command to answer "why was this artifact blocked?" with deterministic trace and evidence links.
- Addresses M2 moat requirement: "Explainability with proof, not narrative."
- Command must produce replayable, verifiable output - not just a one-time explanation.
- Working directory: `src/Cli/StellaOps.Cli/`.
- Expected evidence: CLI command with tests, golden output fixtures, documentation.
**Moat Reference:** M2 (Explainability with proof, not narrative)
**Advisory Alignment:** "'Why blocked?' must produce a deterministic trace + referenced evidence artifacts. The answer must be replayable, not a one-time explanation."
## Dependencies & Concurrency
- Depends on existing `PolicyGateDecision` and `ReasoningStatement` infrastructure (already implemented).
- Can run in parallel with Doctor expansion sprint.
- Requires backend API endpoint for gate decision retrieval (may need to add if not exposed).
## Documentation Prerequisites
- Read `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateDecision.cs` for gate decision model.
- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Statements/ReasoningStatement.cs` for reasoning model.
- Read `src/Findings/StellaOps.Findings.Ledger.WebService/Services/EvidenceGraphBuilder.cs` for evidence linking.
- Read existing CLI command patterns in `src/Cli/StellaOps.Cli/Commands/`.
## Delivery Tracker
### WHY-001 - Backend API for Block Explanation
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Verify or create API endpoint to retrieve block explanation for an artifact:
- `GET /v1/artifacts/{digest}/block-explanation`
- Response includes: gate decision, reasoning statement, evidence links, replay token
- Must support both online (live query) and offline (cached verdict) modes
If endpoint exists, verify it returns all required fields. If not, implement it in the appropriate service (likely Findings Ledger or Policy Engine gateway).
Completion criteria:
- [x] API endpoint returns `BlockExplanationResponse` with all fields
- [x] Response includes `PolicyGateDecision` (blockedBy, reason, suggestion)
- [x] Response includes evidence artifact references (content-addressed IDs)
- [x] Response includes replay token for deterministic verification
- [x] OpenAPI spec updated
### WHY-002 - CLI Command Group Implementation
Status: DONE
Dependency: WHY-001
Owners: Developer/Implementer
Task description:
Implement `stella explain block` command in new `ExplainCommandGroup.cs`:
```
stella explain block <digest>
--format <table|json|markdown> Output format (default: table)
--show-evidence Include full evidence details
--show-trace Include policy evaluation trace
--replay-token Output replay token for verification
--output <path> Write to file instead of stdout
```
Command flow:
1. Resolve artifact by digest (support sha256:xxx format)
2. Fetch block explanation from API
3. Render gate decision with reason and suggestion
4. List evidence artifacts with content IDs
5. Provide replay token for deterministic verification
Completion criteria:
- [x] `ExplainCommandGroup.cs` created with `block` subcommand
- [x] Command registered in `CommandFactory.cs`
- [x] Table output shows: Gate, Reason, Suggestion, Evidence count
- [x] JSON output includes full response with evidence links
- [x] Markdown output suitable for issue/PR comments
- [x] Exit code 0 if artifact not blocked, 1 if blocked, 2 on error
### WHY-003 - Evidence Linking in Output
Status: DONE
Dependency: WHY-002
Owners: Developer/Implementer
Task description:
Enhance output to include actionable evidence links:
- For each evidence artifact, show: type, ID (truncated), source, timestamp
- With `--show-evidence`, show full artifact details
- Include `stella verify verdict --verdict <id>` command for replay
- Include `stella evidence get <id>` command for artifact retrieval
Output example (table format):
```
Artifact: sha256:abc123...
Status: BLOCKED
Gate: VexTrust
Reason: Trust score below threshold (0.45 < 0.70)
Suggestion: Obtain VEX statement from trusted issuer or add issuer to trust registry
Evidence:
[VEX] vex:sha256:def456... vendor-x 2026-01-15T10:00:00Z
[REACH] reach:sha256:789... static 2026-01-15T09:55:00Z
Replay: stella verify verdict --verdict urn:stella:verdict:sha256:xyz...
```
Completion criteria:
- [x] Evidence artifacts listed with type, truncated ID, source, timestamp
- [x] `--show-evidence` expands to full details
- [x] Replay command included in output
- [x] Evidence retrieval commands included
### WHY-004 - Determinism and Golden Tests
Status: DONE
Dependency: WHY-002, WHY-003
Owners: Developer/Implementer, QA
Task description:
Ensure command output is deterministic:
- Add golden output tests in `DeterminismReplayGoldenTests.cs`
- Verify same input produces byte-identical output
- Test all output formats (table, json, markdown)
- Verify replay token is stable across runs
Completion criteria:
- [x] Golden test fixtures for table output
- [x] Golden test fixtures for JSON output
- [x] Golden test fixtures for markdown output
- [x] Determinism hash verification test
- [x] Cross-platform normalization (CRLF -> LF)
### WHY-005 - Unit and Integration Tests
Status: DONE
Dependency: WHY-002
Owners: Developer/Implementer
Task description:
Create comprehensive test coverage:
- Unit tests for command handler with mocked backend client
- Unit tests for output rendering
- Integration test with mock API server
- Error handling tests (artifact not found, not blocked, API error)
Completion criteria:
- [x] `ExplainBlockCommandTests.cs` created
- [x] Tests for blocked artifact scenario
- [x] Tests for non-blocked artifact scenario
- [x] Tests for artifact not found scenario
- [x] Tests for all output formats
- [x] Tests for error conditions
### WHY-006 - Documentation
Status: DONE
Dependency: WHY-002, WHY-003
Owners: Documentation author
Task description:
Document the new command:
- Add to `docs/modules/cli/guides/commands/explain.md`
- Add to `docs/modules/cli/guides/commands/reference.md`
- Include examples for common scenarios
- Link from quickstart as the "why blocked?" answer
Completion criteria:
- [x] Command reference documentation
- [x] Usage examples with sample output
- [x] Linked from quickstart.md
- [x] Troubleshooting section for common issues
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | WHY-002, WHY-003 completed. ExplainCommandGroup.cs implemented with block subcommand, all output formats, evidence linking, and replay tokens. | Developer |
| 2026-01-17 | WHY-004 completed. Golden test fixtures added to DeterminismReplayGoldenTests.cs for explain block command (JSON, table, markdown formats). | QA |
| 2026-01-17 | WHY-005 completed. Comprehensive unit tests added to ExplainBlockCommandTests.cs including error handling, exit codes, edge cases. | QA |
| 2026-01-17 | WHY-006 completed. Documentation created at docs/modules/cli/guides/commands/explain.md and command reference updated. | Documentation |
| 2026-01-17 | WHY-001 completed. BlockExplanationController.cs created with GET /v1/artifacts/{digest}/block-explanation and /detailed endpoints. | Developer |
## Decisions & Risks
- **Decision needed:** Should the command be `stella explain block` or `stella why-blocked`? Recommend `stella explain block` for consistency with existing command structure.
- **Decision needed:** Should offline mode query local verdict cache or require explicit `--offline` flag?
- **Risk:** Backend API may not expose all required fields. Mitigation: WHY-001 verifies/creates endpoint first.
## Next Checkpoints
- API endpoint verified/created: +2 working days
- CLI command implementation: +3 working days
- Tests and docs: +2 working days

View File

@@ -1,280 +0,0 @@
# Sprint 027 · CLI Audit Bundle Command
## Topic & Scope
- Implement `stella audit bundle` command to produce self-contained, auditor-ready evidence packages.
- Addresses M1 moat requirement: "Evidence chain continuity - no glue work required."
- Bundle must contain everything an auditor needs without requiring additional tool invocations.
- Working directory: `src/Cli/StellaOps.Cli/`.
- Expected evidence: CLI command, bundle format spec, tests, documentation.
**Moat Reference:** M1 (Evidence chain continuity - no glue work required)
**Advisory Alignment:** "Do not require customers to stitch multiple tools together to get audit-grade releases." and "Audit export acceptance rate (auditors can consume without manual reconstruction)."
## Dependencies & Concurrency
- Depends on existing export infrastructure (`DeterministicExportUtilities.cs`, `ExportEngine`).
- Can leverage `stella attest bundle` and `stella export run` as foundation.
- Can run in parallel with other CLI sprints.
## Documentation Prerequisites
- Read `src/Cli/StellaOps.Cli/Export/DeterministicExportUtilities.cs` for export patterns.
- Read `src/Excititor/__Libraries/StellaOps.Excititor.Export/ExportEngine.cs` for existing export logic.
- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/` for attestation structures.
- Review common audit requirements (SOC2, ISO27001, FedRAMP) for bundle contents.
## Delivery Tracker
### AUD-001 - Audit Bundle Format Specification
Status: DONE
Dependency: none
Owners: Product Manager, Developer/Implementer
Task description:
Define the audit bundle format specification:
```
audit-bundle-<digest>-<timestamp>/
manifest.json # Bundle manifest with hashes
README.md # Human-readable guide for auditors
verdict/
verdict.json # StellaVerdict artifact
verdict.dsse.json # DSSE envelope with signatures
evidence/
sbom.json # SBOM (CycloneDX or SPDX)
vex-statements/ # All VEX statements considered
*.json
reachability/
analysis.json # Reachability analysis result
call-graph.dot # Call graph visualization (optional)
provenance/
slsa-provenance.json
policy/
policy-snapshot.json # Policy version used
gate-decision.json # Gate evaluation result
evaluation-trace.json # Full policy trace
replay/
knowledge-snapshot.json # Frozen inputs for replay
replay-instructions.md # How to replay verdict
schema/
verdict-schema.json # Schema references
vex-schema.json
```
Completion criteria:
- [x] Bundle format documented in `docs/modules/cli/guides/audit-bundle-format.md`
- [x] Manifest schema defined with file hashes
- [x] README.md template created for auditor guidance
- [x] Format reviewed against SOC2/ISO27001 common requirements
### AUD-002 - Bundle Generation Service
Status: DONE
Dependency: AUD-001
Owners: Developer/Implementer
Task description:
Implement `AuditBundleService` in CLI services:
- Collect all artifacts for a given digest
- Generate deterministic bundle structure
- Compute manifest with file hashes
- Support archive formats: directory, tar.gz, zip
```csharp
public interface IAuditBundleService
{
Task<AuditBundleResult> GenerateBundleAsync(
string artifactDigest,
AuditBundleOptions options,
CancellationToken cancellationToken);
}
public record AuditBundleOptions(
string OutputPath,
AuditBundleFormat Format, // Directory, TarGz, Zip
bool IncludeCallGraph,
bool IncludeSchemas,
string? PolicyVersion);
```
Completion criteria:
- [x] `AuditBundleService.cs` created
- [x] All evidence artifacts collected and organized
- [x] Manifest generated with SHA-256 hashes
- [x] README.md generated from template
- [x] Directory output format working
- [x] tar.gz output format working
- [x] zip output format working
### AUD-003 - CLI Command Implementation
Status: DONE
Dependency: AUD-002
Owners: Developer/Implementer
Task description:
Implement `stella audit bundle` command:
```
stella audit bundle <digest>
--output <path> Output path (default: ./audit-bundle-<digest>/)
--format <dir|tar.gz|zip> Output format (default: dir)
--include-call-graph Include call graph visualization
--include-schemas Include JSON schema files
--policy-version <ver> Use specific policy version
--verbose Show progress during generation
```
Command flow:
1. Resolve artifact by digest
2. Fetch verdict and all linked evidence
3. Generate bundle using `AuditBundleService`
4. Verify bundle integrity (hash check)
5. Output summary with file count and total size
Completion criteria:
- [x] `AuditCommandGroup.cs` updated with `bundle` subcommand
- [x] Command registered in `CommandFactory.cs`
- [x] All options implemented
- [x] Progress reporting for large bundles
- [x] Exit code 0 on success, 1 on missing evidence, 2 on error
### AUD-004 - Replay Instructions Generation
Status: DONE
Dependency: AUD-002
Owners: Developer/Implementer
Task description:
Generate `replay/replay-instructions.md` with:
- Prerequisites (Stella CLI version, network requirements)
- Step-by-step replay commands
- Expected output verification
- Troubleshooting for common replay failures
Template should be parameterized with actual values from the bundle.
Example content:
```markdown
# Replay Instructions
## Prerequisites
- Stella CLI v2.5.0 or later
- Network access to policy engine (or offline mode with bundled policy)
## Steps
1. Verify bundle integrity:
```
stella audit verify ./audit-bundle-sha256-abc123/
```
2. Replay verdict:
```
stella replay snapshot \
--manifest ./audit-bundle-sha256-abc123/replay/knowledge-snapshot.json \
--output ./replay-result.json
```
3. Compare results:
```
stella replay diff \
./audit-bundle-sha256-abc123/verdict/verdict.json \
./replay-result.json
```
## Expected Result
Verdict digest should match: sha256:abc123...
```
Completion criteria:
- [x] `ReplayInstructionsGenerator.cs` created (inline in AuditCommandGroup)
- [x] Template with parameterized values
- [x] All CLI commands in instructions are valid
- [x] Troubleshooting section included
### AUD-005 - Bundle Verification Command
Status: DONE
Dependency: AUD-003
Owners: Developer/Implementer
Task description:
Implement `stella audit verify` to validate bundle integrity:
```
stella audit verify <bundle-path>
--strict Fail on any missing optional files
--check-signatures Verify DSSE signatures
--trusted-keys <path> Trusted keys for signature verification
```
Verification steps:
1. Parse manifest.json
2. Verify all file hashes match
3. Validate verdict content ID
4. Optionally verify signatures
5. Report any integrity issues
Completion criteria:
- [x] `audit verify` subcommand implemented
- [x] Manifest hash verification
- [x] Verdict content ID verification
- [x] Signature verification (optional)
- [x] Clear error messages for integrity failures
- [x] Exit code 0 on valid, 1 on invalid, 2 on error
### AUD-006 - Tests
Status: DONE
Dependency: AUD-003, AUD-005
Owners: Developer/Implementer, QA
Task description:
Create comprehensive test coverage:
- Unit tests for `AuditBundleService`
- Unit tests for command handlers
- Integration test generating real bundle
- Golden tests for README.md and replay-instructions.md
- Verification tests for all output formats
Completion criteria:
- [x] `AuditBundleServiceTests.cs` created
- [x] `AuditBundleCommandTests.cs` created (combined with service tests)
- [x] `AuditVerifyCommandTests.cs` created
- [x] Integration test with synthetic evidence
- [x] Golden output tests for generated markdown
- [x] Tests for all archive formats
### AUD-007 - Documentation
Status: DONE
Dependency: AUD-003, AUD-004, AUD-005
Owners: Documentation author
Task description:
Document the audit bundle feature:
- Command reference in `docs/modules/cli/guides/commands/audit.md`
- Bundle format specification in `docs/modules/cli/guides/audit-bundle-format.md`
- Auditor guide in `docs/operations/guides/auditor-guide.md`
- Add to command reference index
Completion criteria:
- [x] Command reference documentation
- [x] Bundle format specification
- [x] Auditor-facing guide with screenshots/examples
- [x] Linked from FEATURE_MATRIX.md
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | AUD-003, AUD-004 completed. audit bundle command implemented in AuditCommandGroup.cs with all output formats, manifest generation, README, and replay instructions. | Developer |
| 2026-01-17 | AUD-001, AUD-002, AUD-005, AUD-006, AUD-007 completed. Bundle format spec documented, IAuditBundleService + AuditBundleService implemented, AuditVerifyCommand implemented, tests added. | Developer |
| 2026-01-17 | AUD-007 documentation completed. Command reference (audit.md), auditor guide created. | Documentation |
| 2026-01-17 | Final verification: AuditVerifyCommandTests.cs created with archive format tests and golden output tests. All tasks DONE. Sprint ready for archive. | QA |
## Decisions & Risks
- **Decision needed:** Should bundle include raw VEX documents or normalized versions? Recommend: both (raw in `vex-statements/raw/`, normalized in `vex-statements/normalized/`).
- **Decision needed:** What archive format should be default? Recommend: directory for local use, tar.gz for transfer.
- **Risk:** Large bundles may be slow to generate. Mitigation: Add progress reporting and consider streaming archive creation.
- **Risk:** Bundle format may need evolution. Mitigation: Include schema version in manifest from day one.
## Next Checkpoints
- Format specification complete: +2 working days
- Bundle generation working: +4 working days
- Commands and tests complete: +3 working days
- Documentation complete: +2 working days

View File

@@ -1,240 +0,0 @@
# Sprint 028 · P0 Product Metrics Definition
## Topic & Scope
- Define and instrument the four P0 product-level metrics from the AI Economics Moat advisory.
- Create Grafana dashboard templates for tracking these metrics.
- Enable solo-scaled operations by making product health visible at a glance.
- Working directory: `src/Telemetry/`, `devops/telemetry/`.
- Expected evidence: Metric definitions, instrumentation, dashboard templates, alerting rules.
**Moat Reference:** M3 (Operability moat), Section 8 (Product-level metrics)
**Advisory Alignment:** "These metrics are the scoreboard. Prioritize work that improves them."
## Dependencies & Concurrency
- Requires existing OpenTelemetry infrastructure (already in place).
- Can run in parallel with other sprints.
- Dashboard templates depend on Grafana/Prometheus stack.
## Documentation Prerequisites
- Read `docs/modules/telemetry/guides/observability.md` for existing metric patterns.
- Read `src/Attestor/StellaOps.Attestor/StellaOps.Attestor.Core/Verification/RekorVerificationMetrics.cs` for metric implementation patterns.
- Read advisory section 8 for metric definitions.
## Delivery Tracker
### P0M-001 - Time-to-First-Verified-Release Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_time_to_first_verified_release_seconds` histogram:
**Definition:** Elapsed time from fresh install (first service startup) to first successful verified promotion (policy gate passed, evidence recorded).
**Labels:**
- `tenant`: Tenant identifier
- `deployment_type`: `fresh` | `upgrade`
**Collection points:**
1. Record install timestamp on first Authority startup (store in DB)
2. Record first verified promotion timestamp in Release Orchestrator
3. Emit metric on first promotion with duration = promotion_time - install_time
**Implementation:**
- Add `InstallTimestampService` to record first startup
- Add metric emission in `ReleaseOrchestrator` on first promotion per tenant
- Use histogram buckets: 5m, 15m, 30m, 1h, 2h, 4h, 8h, 24h, 48h, 168h (1 week)
Completion criteria:
- [x] Install timestamp recorded on first startup
- [x] Metric emitted on first verified promotion
- [x] Histogram with appropriate buckets
- [x] Label for tenant and deployment type
- [x] Unit test for metric emission
### P0M-002 - Mean Time to Answer "Why Blocked" Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_why_blocked_latency_seconds` histogram:
**Definition:** Time from block decision to user viewing explanation (via CLI, UI, or API).
**Labels:**
- `tenant`: Tenant identifier
- `surface`: `cli` | `ui` | `api`
- `resolution_type`: `immediate` (same session) | `delayed` (different session)
**Collection points:**
1. Record block decision timestamp in verdict
2. Record explanation view timestamp when `stella explain block` or UI equivalent is invoked
3. Emit metric with duration
**Implementation:**
- Add explanation view tracking in CLI command
- Add explanation view tracking in UI (existing telemetry hook)
- Correlate via artifact digest
- Use histogram buckets: 1s, 5s, 30s, 1m, 5m, 15m, 1h, 4h, 24h
Completion criteria:
- [x] Block decision timestamp available in verdict
- [x] Explanation view events tracked
- [x] Correlation by artifact digest
- [x] Histogram with appropriate buckets
- [x] Surface label populated correctly
### P0M-003 - Support Minutes per Customer Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_support_burden_minutes_total` counter:
**Definition:** Accumulated support time per customer per month. This is a manual/semi-automated metric for solo operations tracking.
**Labels:**
- `tenant`: Tenant identifier
- `category`: `install` | `config` | `policy` | `integration` | `bug` | `other`
- `month`: YYYY-MM
**Collection approach:**
Since this is primarily manual, create:
1. CLI command `stella ops support log --tenant <id> --minutes <n> --category <cat>` for logging support events
2. API endpoint for programmatic logging
3. Counter incremented on each log entry
**Target:** Trend toward zero. Alert if any tenant exceeds 30 minutes/month.
Completion criteria:
- [x] Metric definition in P0ProductMetrics.cs
- [x] Counter metric with labels
- [x] Monthly aggregation capability
- [x] Dashboard panel showing trend
### P0M-004 - Determinism Regressions Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_determinism_regressions_total` counter:
**Definition:** Count of detected determinism failures in production (same inputs produced different outputs).
**Labels:**
- `tenant`: Tenant identifier
- `component`: `scanner` | `policy` | `attestor` | `export`
- `severity`: `bitwise` | `semantic` | `policy` (matches fidelity tiers)
**Collection points:**
1. Determinism verification jobs (scheduled)
2. Replay verification failures
3. Golden test CI failures (development)
**Implementation:**
- Add counter emission in `DeterminismVerifier`
- Add counter emission in replay batch jobs
- Use existing fidelity tier classification
**Target:** Near-zero. Alert immediately on any `policy` severity regression.
Completion criteria:
- [x] Counter metric with labels
- [x] Emission on determinism verification failure
- [x] Severity classification (bitwise/semantic/policy)
- [x] Unit test for metric emission
### P0M-005 - Grafana Dashboard Template
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004
Owners: Developer/Implementer
Task description:
Create Grafana dashboard template `stella-ops-p0-metrics.json`:
**Panels:**
1. **Time to First Release** - Histogram heatmap + P50/P90/P99 stat
2. **Why Blocked Latency** - Histogram heatmap + trend line
3. **Support Burden** - Stacked bar by category, monthly trend
4. **Determinism Regressions** - Counter with severity breakdown, alert status
**Features:**
- Tenant selector variable
- Time range selector
- Drill-down links to detailed dashboards
- SLO indicator (green/yellow/red)
**File location:** `devops/telemetry/grafana/dashboards/stella-ops-p0-metrics.json`
Completion criteria:
- [x] Dashboard JSON template created
- [x] All four P0 metrics visualized
- [x] Tenant filtering working
- [x] SLO indicators configured
- [x] Unit test for dashboard schema
### P0M-006 - Alerting Rules
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004
Owners: Developer/Implementer
Task description:
Create Prometheus alerting rules for P0 metrics:
**Rules:**
1. `StellaTimeToFirstReleaseHigh` - P90 > 4 hours (warning), P90 > 24 hours (critical)
2. `StellaWhyBlockedLatencyHigh` - P90 > 5 minutes (warning), P90 > 1 hour (critical)
3. `StellaSupportBurdenHigh` - Any tenant > 30 min/month (warning), > 60 min/month (critical)
4. `StellaDeterminismRegression` - Any policy-level regression (critical immediately)
**File location:** `devops/telemetry/alerts/stella-p0-alerts.yml`
Completion criteria:
- [x] Alert rules file created
- [x] All four metrics have alert rules
- [x] Severity levels appropriate
- [x] Alert annotations include runbook links
- [x] Tested with synthetic data
### P0M-007 - Documentation
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004, P0M-005, P0M-006
Owners: Documentation author
Task description:
Document the P0 metrics:
- Add metrics to `docs/modules/telemetry/guides/p0-metrics.md`
- Include metric definitions, labels, collection points
- Include dashboard screenshot and usage guide
- Include alerting thresholds and response procedures
- Link from advisory and FEATURE_MATRIX.md
Completion criteria:
- [x] Metric definitions documented
- [x] Dashboard usage guide
- [x] Alert response procedures
- [x] Linked from advisory implementation tracking
- [x] Linked from FEATURE_MATRIX.md
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | P0M-001 through P0M-006 completed. P0ProductMetrics.cs, InstallTimestampService.cs, Grafana dashboard, and alert rules implemented. Tests added. | Developer |
| 2026-01-17 | P0M-007 completed. docs/modules/telemetry/guides/p0-metrics.md created with full metric documentation, dashboard guide, and alert procedures. | Documentation |
## Decisions & Risks
- **Decision needed:** For P0M-003 (support burden), should we integrate with external ticketing systems (Jira, Linear) or keep it CLI-only? Recommend: CLI-only initially, add integrations later.
- **Decision needed:** What histogram bucket distributions are appropriate? Recommend: Start with proposed buckets, refine based on real data.
- **Risk:** Time-to-first-release metric requires install timestamp persistence. If DB is wiped, metric resets. Mitigation: Accept this limitation; document in metric description.
- **Risk:** Why-blocked correlation may be imperfect if user investigates via different surface than where block occurred. Mitigation: Track best-effort, note limitation in docs.
## Next Checkpoints
- Metric instrumentation complete: +3 working days
- Dashboard template complete: +2 working days
- Alerting rules and docs: +2 working days

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,749 @@
# Drift Remediation Automation
## Overview
Drift Remediation Automation extends the existing drift detection system with intelligent, policy-driven automatic remediation. While drift detection identifies divergence between expected and actual state, remediation automation closes the loop by taking corrective action without manual intervention.
This is a best-in-class implementation that balances automation with safety, providing configurable remediation strategies, severity-based prioritization, and comprehensive audit trails.
---
## Design Principles
1. **Safety First**: Auto-remediation never executes without explicit policy authorization
2. **Gradual Escalation**: Start with notifications, escalate to remediation based on drift age/severity
3. **Deterministic Actions**: Remediation produces identical outcomes for identical drift states
4. **Full Auditability**: Every remediation action generates signed evidence packets
5. **Blast Radius Control**: Limit concurrent remediations; prevent cascading failures
6. **Human Override**: Operators can pause, cancel, or override any remediation
---
## Architecture
### Component Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ Drift Remediation System │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ DriftDetector │───▶│ RemediationEngine│───▶│ ActionExecutor│ │
│ │ (existing) │ │ │ │ │ │
│ └─────────────────┘ └──────────────────┘ └───────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ SeverityScorer │ │ PolicyEvaluator │ │ EvidenceWriter│ │
│ │ │ │ │ │ │ │
│ └─────────────────┘ └──────────────────┘ └───────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ AlertRouter │ │ ReconcileScheduler│ │ MetricsEmitter│ │
│ │ │ │ │ │ │ │
│ └─────────────────┘ └──────────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
```
### Key Components
#### 1. SeverityScorer
Calculates drift severity based on multiple weighted factors:
```csharp
public sealed record DriftSeverity
{
public DriftSeverityLevel Level { get; init; } // Critical, High, Medium, Low, Info
public int Score { get; init; } // 0-100 numeric score
public ImmutableArray<SeverityFactor> Factors { get; init; }
public TimeSpan DriftAge { get; init; }
public bool RequiresImmediate { get; init; }
}
public enum DriftSeverityLevel
{
Info = 0, // Cosmetic differences (labels, annotations)
Low = 25, // Non-critical drift (resource limits changed)
Medium = 50, // Functional drift (ports, volumes)
High = 75, // Security drift (image digest mismatch)
Critical = 100 // Severe drift (container missing, wrong image)
}
```
**Severity Factors:**
| Factor | Weight | Description |
|--------|--------|-------------|
| Drift Type | 30% | Missing > Digest Mismatch > Status Mismatch > Unexpected |
| Drift Age | 25% | Older drift = higher severity |
| Environment Criticality | 20% | Production > Staging > Development |
| Component Criticality | 15% | Core services weighted higher |
| Blast Radius | 10% | Number of dependent services affected |
#### 2. RemediationPolicy
Defines when and how to remediate drift:
```csharp
public sealed record RemediationPolicy
{
public Guid Id { get; init; }
public string Name { get; init; }
public Guid EnvironmentId { get; init; }
// Triggers
public RemediationTrigger Trigger { get; init; }
public DriftSeverityLevel MinimumSeverity { get; init; }
public TimeSpan MinimumDriftAge { get; init; }
public TimeSpan MaximumDriftAge { get; init; } // Escalate to manual if exceeded
// Actions
public RemediationAction Action { get; init; }
public RemediationStrategy Strategy { get; init; }
// Safety limits
public int MaxConcurrentRemediations { get; init; }
public int MaxRemediationsPerHour { get; init; }
public TimeSpan CooldownPeriod { get; init; }
// Schedule
public RemediationWindow? MaintenanceWindow { get; init; }
public ImmutableArray<DayOfWeek> AllowedDays { get; init; }
public TimeOnly AllowedStartTime { get; init; }
public TimeOnly AllowedEndTime { get; init; }
// Notifications
public NotificationConfig Notifications { get; init; }
}
public enum RemediationTrigger
{
Immediate, // Remediate as soon as detected
Scheduled, // Wait for maintenance window
AgeThreshold, // Remediate after drift exceeds age
SeverityEscalation, // Remediate when severity increases
Manual // Notification only, human initiates
}
public enum RemediationAction
{
NotifyOnly, // Alert but don't act
Reconcile, // Restore to expected state
Rollback, // Rollback to previous known-good release
Scale, // Adjust replica count
Restart, // Restart containers
Quarantine // Isolate drifted targets from traffic
}
public enum RemediationStrategy
{
AllAtOnce, // Remediate all drifted targets simultaneously
Rolling, // Remediate one at a time with health checks
Canary, // Remediate one, verify, then proceed
BlueGreen // Deploy to standby, switch traffic
}
```
#### 3. RemediationEngine
Orchestrates the remediation process:
```csharp
public sealed class RemediationEngine
{
public async Task<RemediationPlan> CreatePlanAsync(
DriftReport driftReport,
RemediationPolicy policy,
CancellationToken ct)
{
// 1. Score severity for each drift item
var scoredDrifts = await _severityScorer.ScoreAsync(driftReport.Items, ct);
// 2. Filter by policy thresholds
var actionable = scoredDrifts
.Where(d => d.Severity.Level >= policy.MinimumSeverity)
.Where(d => d.Severity.DriftAge >= policy.MinimumDriftAge)
.ToImmutableArray();
// 3. Check maintenance window
if (!IsWithinMaintenanceWindow(policy))
return RemediationPlan.Deferred(actionable, policy.MaintenanceWindow);
// 4. Check rate limits
var allowed = await CheckRateLimitsAsync(actionable, policy, ct);
// 5. Build execution plan
return BuildExecutionPlan(allowed, policy);
}
public async Task<RemediationResult> ExecuteAsync(
RemediationPlan plan,
CancellationToken ct)
{
// Execute with blast radius control
var semaphore = new SemaphoreSlim(plan.Policy.MaxConcurrentRemediations);
var results = new ConcurrentBag<TargetRemediationResult>();
foreach (var batch in plan.Batches)
{
var tasks = batch.Targets.Select(async target =>
{
await semaphore.WaitAsync(ct);
try
{
return await RemediateTargetAsync(target, plan, ct);
}
finally
{
semaphore.Release();
}
});
var batchResults = await Task.WhenAll(tasks);
results.AddRange(batchResults);
// Health check between batches for rolling strategy
if (plan.Policy.Strategy == RemediationStrategy.Rolling)
{
await VerifyBatchHealthAsync(batchResults, ct);
}
}
// Generate evidence
var evidence = await _evidenceWriter.WriteAsync(plan, results, ct);
return new RemediationResult(plan.Id, results.ToImmutableArray(), evidence);
}
}
```
#### 4. ReconcileScheduler
Manages scheduled reconciliation runs:
```csharp
public sealed class ReconcileScheduler
{
private readonly TimeProvider _timeProvider;
private readonly IRemediationPolicyStore _policyStore;
private readonly IDriftDetector _driftDetector;
private readonly RemediationEngine _engine;
public async Task RunScheduledReconciliationAsync(CancellationToken ct)
{
var policies = await _policyStore.GetScheduledPoliciesAsync(ct);
foreach (var policy in policies)
{
if (!IsWithinWindow(policy))
continue;
// Detect drift
var inventory = await _inventoryService.GetCurrentAsync(policy.EnvironmentId, ct);
var expected = await _releaseService.GetExpectedStateAsync(policy.EnvironmentId, ct);
var drift = _driftDetector.Detect(inventory, expected);
if (drift.HasDrift)
{
var plan = await _engine.CreatePlanAsync(drift, policy, ct);
await _engine.ExecuteAsync(plan, ct);
}
}
}
}
```
---
## Data Models
### RemediationPlan
```csharp
public sealed record RemediationPlan
{
public Guid Id { get; init; }
public Guid DriftReportId { get; init; }
public RemediationPolicy Policy { get; init; }
public RemediationPlanStatus Status { get; init; }
public ImmutableArray<RemediationBatch> Batches { get; init; }
public DateTimeOffset CreatedAt { get; init; }
public DateTimeOffset? ScheduledFor { get; init; }
public DateTimeOffset? StartedAt { get; init; }
public DateTimeOffset? CompletedAt { get; init; }
public string? DeferralReason { get; init; }
}
public enum RemediationPlanStatus
{
Created,
Scheduled,
Deferred, // Waiting for maintenance window
Running,
Paused, // Human intervention requested
Succeeded,
PartialSuccess, // Some targets remediated, some failed
Failed,
Cancelled
}
public sealed record RemediationBatch
{
public int Order { get; init; }
public ImmutableArray<RemediationTarget> Targets { get; init; }
public TimeSpan? DelayAfter { get; init; }
public bool RequiresHealthCheck { get; init; }
}
public sealed record RemediationTarget
{
public Guid TargetId { get; init; }
public string TargetName { get; init; }
public DriftItem Drift { get; init; }
public DriftSeverity Severity { get; init; }
public RemediationAction Action { get; init; }
public string? ActionPayload { get; init; } // Compose file, rollback digest, etc.
}
```
### RemediationResult
```csharp
public sealed record RemediationResult
{
public Guid PlanId { get; init; }
public RemediationResultStatus Status { get; init; }
public ImmutableArray<TargetRemediationResult> TargetResults { get; init; }
public Guid EvidencePacketId { get; init; }
public TimeSpan Duration { get; init; }
public RemediationMetrics Metrics { get; init; }
}
public sealed record TargetRemediationResult
{
public Guid TargetId { get; init; }
public RemediationTargetStatus Status { get; init; }
public string? Error { get; init; }
public TimeSpan Duration { get; init; }
public string? PreviousDigest { get; init; }
public string? CurrentDigest { get; init; }
public ImmutableArray<string> Logs { get; init; }
}
public sealed record RemediationMetrics
{
public int TotalTargets { get; init; }
public int Succeeded { get; init; }
public int Failed { get; init; }
public int Skipped { get; init; }
public TimeSpan TotalDuration { get; init; }
public TimeSpan AverageTargetDuration { get; init; }
}
```
---
## API Design
### REST Endpoints
```
# Policies
POST /api/v1/remediation/policies # Create policy
GET /api/v1/remediation/policies # List policies
GET /api/v1/remediation/policies/{id} # Get policy
PUT /api/v1/remediation/policies/{id} # Update policy
DELETE /api/v1/remediation/policies/{id} # Delete policy
POST /api/v1/remediation/policies/{id}/activate # Activate policy
POST /api/v1/remediation/policies/{id}/deactivate # Deactivate policy
# Plans
GET /api/v1/remediation/plans # List plans
GET /api/v1/remediation/plans/{id} # Get plan details
POST /api/v1/remediation/plans/{id}/execute # Execute deferred plan
POST /api/v1/remediation/plans/{id}/pause # Pause running plan
POST /api/v1/remediation/plans/{id}/resume # Resume paused plan
POST /api/v1/remediation/plans/{id}/cancel # Cancel plan
# On-demand
POST /api/v1/remediation/preview # Preview remediation (dry-run)
POST /api/v1/remediation/execute # Execute immediate remediation
# History
GET /api/v1/remediation/history # List remediation history
GET /api/v1/remediation/history/{id} # Get remediation result
GET /api/v1/remediation/history/{id}/evidence # Get evidence packet
```
### WebSocket Events
```typescript
// Real-time remediation updates
interface RemediationEvent {
type: 'plan.created' | 'plan.started' | 'plan.completed' |
'target.started' | 'target.completed' | 'target.failed';
planId: string;
targetId?: string;
status: string;
progress?: number;
message?: string;
timestamp: string;
}
```
---
## Severity Scoring Algorithm
```csharp
public sealed class SeverityScorer
{
private readonly SeverityScoringConfig _config;
public DriftSeverity Score(DriftItem drift, ScoringContext context)
{
var factors = new List<SeverityFactor>();
var score = 0.0;
// Factor 1: Drift Type (30%)
var typeScore = drift.Type switch
{
DriftType.Missing => 100,
DriftType.DigestMismatch => 80,
DriftType.StatusMismatch => 50,
DriftType.Unexpected => 30,
_ => 10
};
factors.Add(new SeverityFactor("DriftType", typeScore, 0.30));
score += typeScore * 0.30;
// Factor 2: Drift Age (25%)
var ageScore = CalculateAgeScore(drift.DetectedAt, context.Now);
factors.Add(new SeverityFactor("DriftAge", ageScore, 0.25));
score += ageScore * 0.25;
// Factor 3: Environment Criticality (20%)
var envScore = context.Environment.Criticality switch
{
EnvironmentCriticality.Production => 100,
EnvironmentCriticality.Staging => 60,
EnvironmentCriticality.Development => 20,
_ => 10
};
factors.Add(new SeverityFactor("EnvironmentCriticality", envScore, 0.20));
score += envScore * 0.20;
// Factor 4: Component Criticality (15%)
var componentScore = context.ComponentCriticality.GetValueOrDefault(drift.ComponentId, 50);
factors.Add(new SeverityFactor("ComponentCriticality", componentScore, 0.15));
score += componentScore * 0.15;
// Factor 5: Blast Radius (10%)
var blastScore = CalculateBlastRadius(drift, context.DependencyGraph);
factors.Add(new SeverityFactor("BlastRadius", blastScore, 0.10));
score += blastScore * 0.10;
return new DriftSeverity
{
Level = ScoreToLevel((int)score),
Score = (int)score,
Factors = factors.ToImmutableArray(),
DriftAge = context.Now - drift.DetectedAt,
RequiresImmediate = score >= 90
};
}
private int CalculateAgeScore(DateTimeOffset detectedAt, DateTimeOffset now)
{
var age = now - detectedAt;
return age.TotalMinutes switch
{
< 5 => 10, // Very fresh - low urgency
< 30 => 30, // Recent
< 60 => 50, // 1 hour
< 240 => 70, // 4 hours
< 1440 => 85, // 24 hours
_ => 100 // > 24 hours - critical
};
}
private int CalculateBlastRadius(DriftItem drift, DependencyGraph graph)
{
var dependents = graph.GetDependents(drift.ComponentId);
return dependents.Count switch
{
0 => 10,
< 3 => 30,
< 10 => 60,
< 25 => 80,
_ => 100
};
}
}
```
---
## Safety Mechanisms
### 1. Rate Limiting
```csharp
public sealed class RemediationRateLimiter
{
public async Task<RateLimitResult> CheckAsync(
RemediationPolicy policy,
int requestedCount,
CancellationToken ct)
{
var hourlyCount = await GetHourlyRemediationCountAsync(policy.Id, ct);
var dailyCount = await GetDailyRemediationCountAsync(policy.Id, ct);
if (hourlyCount + requestedCount > policy.MaxRemediationsPerHour)
{
return RateLimitResult.Exceeded(
$"Hourly limit exceeded: {hourlyCount}/{policy.MaxRemediationsPerHour}");
}
var lastRemediation = await GetLastRemediationAsync(policy.Id, ct);
if (lastRemediation != null)
{
var timeSinceLast = _timeProvider.GetUtcNow() - lastRemediation.CompletedAt;
if (timeSinceLast < policy.CooldownPeriod)
{
return RateLimitResult.Cooldown(policy.CooldownPeriod - timeSinceLast);
}
}
return RateLimitResult.Allowed(requestedCount);
}
}
```
### 2. Blast Radius Control
```csharp
// Maximum percentage of targets that can be remediated in one operation
public const int MaxTargetPercentage = 25;
// Never remediate more than this many targets at once
public const int AbsoluteMaxTargets = 10;
// Minimum healthy targets required before remediation
public const double MinHealthyPercentage = 0.75;
```
### 3. Circuit Breaker
```csharp
public sealed class RemediationCircuitBreaker
{
private int _consecutiveFailures;
private DateTimeOffset? _openedAt;
public bool IsOpen => _openedAt != null &&
(_timeProvider.GetUtcNow() - _openedAt.Value) < _config.OpenDuration;
public void RecordSuccess()
{
_consecutiveFailures = 0;
_openedAt = null;
}
public void RecordFailure()
{
_consecutiveFailures++;
if (_consecutiveFailures >= _config.FailureThreshold)
{
_openedAt = _timeProvider.GetUtcNow();
_logger.LogWarning("Remediation circuit breaker opened after {Failures} failures",
_consecutiveFailures);
}
}
}
```
---
## Metrics & Observability
### Prometheus Metrics
```
# Counters
stella_remediation_plans_total{environment, policy, status}
stella_remediation_targets_total{environment, action, status}
stella_remediation_rate_limit_hits_total{policy}
# Histograms
stella_remediation_plan_duration_seconds{environment, strategy}
stella_remediation_target_duration_seconds{environment, action}
stella_remediation_detection_to_action_seconds{environment, severity}
# Gauges
stella_drift_items_pending_remediation{environment, severity}
stella_remediation_circuit_breaker_open{policy}
```
### Structured Logging
```json
{
"event": "remediation.target.completed",
"plan_id": "abc-123",
"target_id": "target-456",
"environment": "production",
"action": "reconcile",
"drift_type": "digest_mismatch",
"severity": "high",
"duration_ms": 4532,
"status": "succeeded",
"previous_digest": "sha256:abc...",
"current_digest": "sha256:def...",
"correlation_id": "xyz-789"
}
```
---
## Evidence Generation
Every remediation produces a sealed evidence packet:
```csharp
public sealed record RemediationEvidence
{
// What drifted
public ImmutableArray<DriftItem> DetectedDrift { get; init; }
public ImmutableArray<DriftSeverity> Severities { get; init; }
// Policy applied
public RemediationPolicy Policy { get; init; }
// Plan executed
public RemediationPlan Plan { get; init; }
// Results
public ImmutableArray<TargetRemediationResult> Results { get; init; }
// Who/when
public string InitiatedBy { get; init; } // "system:auto" or user ID
public DateTimeOffset InitiatedAt { get; init; }
public DateTimeOffset CompletedAt { get; init; }
// Artifacts
public ImmutableArray<string> GeneratedArtifacts { get; init; } // Compose files, scripts
}
```
---
## Configuration
### Default Policy Template
```yaml
name: "production-auto-remediation"
environment_id: "prod-001"
trigger: age_threshold
minimum_severity: high
minimum_drift_age: "00:15:00" # 15 minutes
maximum_drift_age: "24:00:00" # 24 hours, then escalate to manual
action: reconcile
strategy: rolling
safety:
max_concurrent_remediations: 2
max_remediations_per_hour: 10
cooldown_period: "00:05:00" # 5 minutes between remediations
schedule:
maintenance_window:
enabled: true
start: "02:00"
end: "06:00"
timezone: "UTC"
allowed_days: [monday, tuesday, wednesday, thursday, friday]
notifications:
on_plan_created: true
on_remediation_started: true
on_remediation_completed: true
on_remediation_failed: true
channels:
- type: slack
channel: "#ops-alerts"
- type: email
recipients: ["ops-team@example.com"]
```
---
## Test Strategy
### Unit Tests
- Severity scoring with various drift combinations
- Rate limiting logic
- Circuit breaker state transitions
- Policy evaluation with edge cases
### Integration Tests
- Full remediation flow: detect → plan → execute → verify
- Maintenance window enforcement
- Rate limit enforcement across multiple requests
- Evidence packet generation and signing
### Chaos Tests
- Agent failure during remediation
- Database unavailability during plan execution
- Concurrent remediation requests
- Clock skew handling
### Golden Tests
- Deterministic severity scores for fixed inputs
- Deterministic plan generation for fixed drift reports
- Evidence packet structure validation
---
## Migration Path
### Phase 1: Foundation (Week 1-2)
- Severity scoring service
- Remediation policy model and store
- Basic API endpoints
### Phase 2: Engine (Week 3-4)
- Remediation engine implementation
- Plan creation and execution
- Target remediation logic
### Phase 3: Safety (Week 5)
- Rate limiting
- Circuit breaker
- Blast radius controls
### Phase 4: Scheduling (Week 6)
- Maintenance window support
- Scheduled reconciliation
- Age-based escalation
### Phase 5: Observability (Week 7)
- Metrics emission
- Evidence generation
- Alert integration
### Phase 6: UI & Polish (Week 8)
- Web console integration
- Real-time updates
- Policy management UI

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,951 @@
# Performance Optimizations
## Overview
Performance Optimizations transforms the Release Orchestrator into a high-performance system capable of handling enterprise-scale deployments. This enhancement provides parallel gate evaluation, bulk digest resolution, agent task batching, optimized database queries, and intelligent caching strategies.
This is a best-in-class implementation focused on reducing latency, increasing throughput, and ensuring the system scales efficiently under load.
---
## Design Principles
1. **Measure First**: Optimize based on profiling data, not assumptions
2. **Parallel by Default**: Concurrent execution where dependencies allow
3. **Cache Intelligently**: Cache at the right level with proper invalidation
4. **Batch Operations**: Reduce round-trips through batching
5. **Async Everything**: Non-blocking operations throughout
6. **Graceful Degradation**: Performance degrades linearly, not exponentially
---
## Architecture
### Component Overview
```
┌────────────────────────────────────────────────────────────────────────┐
│ Performance Optimization System │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │
│ │ ParallelGate │ │ BulkDigestResolver│ │ QueryOptimizer │ │
│ │ Evaluator │ │ │ │ │ │
│ └──────────────────┘ └───────────────────┘ └─────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │
│ │ TaskBatcher │ │ CacheManager │ │ ConnectionPool │ │
│ │ │ │ │ │ │ │
│ └──────────────────┘ └───────────────────┘ └─────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │
│ │ Prefetcher │ │ IndexManager │ │ LoadBalancer │ │
│ │ │ │ │ │ │ │
│ └──────────────────┘ └───────────────────┘ └─────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
```
### Key Components
#### 1. ParallelGateEvaluator
Evaluates multiple gates concurrently:
```csharp
public sealed class ParallelGateEvaluator
{
private readonly ImmutableArray<IGateEvaluator> _evaluators;
private readonly SemaphoreSlim _concurrencyLimiter;
private readonly IGateResultCache _cache;
public ParallelGateEvaluator(ParallelGateConfig config)
{
_concurrencyLimiter = new SemaphoreSlim(config.MaxConcurrentEvaluations);
}
public async Task<GateEvaluationResult> EvaluateAllAsync(
PromotionContext context,
IReadOnlyList<GateDefinition> gates,
CancellationToken ct)
{
var result = new GateEvaluationResult
{
PromotionId = context.PromotionId,
StartedAt = _timeProvider.GetUtcNow()
};
// Group gates by dependency
var executionPlan = BuildExecutionPlan(gates);
foreach (var stage in executionPlan.Stages)
{
// Execute all gates in this stage concurrently
var stageTasks = stage.Gates.Select(async gate =>
{
await _concurrencyLimiter.WaitAsync(ct);
try
{
return await EvaluateSingleGateAsync(gate, context, ct);
}
finally
{
_concurrencyLimiter.Release();
}
});
var stageResults = await Task.WhenAll(stageTasks);
result.GateResults.AddRange(stageResults);
// Check for failures that should stop evaluation
var failures = stageResults.Where(r => r.Status == GateStatus.Failed && r.Gate.StopOnFailure);
if (failures.Any())
{
result.Status = GateEvaluationStatus.Failed;
result.FailedGates = failures.Select(f => f.Gate.Id).ToImmutableArray();
break;
}
}
result.CompletedAt = _timeProvider.GetUtcNow();
return result;
}
private async Task<SingleGateResult> EvaluateSingleGateAsync(
GateDefinition gate,
PromotionContext context,
CancellationToken ct)
{
// Check cache first
var cacheKey = BuildCacheKey(gate, context);
var cached = await _cache.GetAsync(cacheKey, ct);
if (cached != null && !IsExpired(cached, gate.CacheTtl))
{
return cached with { FromCache = true };
}
// Evaluate
var evaluator = _evaluators.First(e => e.CanEvaluate(gate.Type));
var sw = Stopwatch.StartNew();
try
{
var result = await evaluator.EvaluateAsync(gate, context, ct);
sw.Stop();
result = result with
{
EvaluationDuration = sw.Elapsed,
EvaluatedAt = _timeProvider.GetUtcNow()
};
// Cache result
await _cache.SetAsync(cacheKey, result, gate.CacheTtl, ct);
return result;
}
catch (Exception ex)
{
return new SingleGateResult
{
GateId = gate.Id,
Status = GateStatus.Error,
Error = ex.Message,
EvaluationDuration = sw.Elapsed
};
}
}
private GateExecutionPlan BuildExecutionPlan(IReadOnlyList<GateDefinition> gates)
{
var plan = new GateExecutionPlan();
var remaining = gates.ToList();
var completed = new HashSet<Guid>();
while (remaining.Any())
{
// Find gates with all dependencies satisfied
var ready = remaining
.Where(g => g.DependsOn.All(d => completed.Contains(d)))
.ToList();
if (!ready.Any())
{
throw new CircularDependencyException(remaining.Select(g => g.Id));
}
plan.Stages.Add(new GateExecutionStage { Gates = ready.ToImmutableArray() });
foreach (var gate in ready)
{
completed.Add(gate.Id);
remaining.Remove(gate);
}
}
return plan;
}
}
```
#### 2. BulkDigestResolver
Resolves multiple image digests in parallel:
```csharp
public sealed class BulkDigestResolver
{
private readonly IRegistryClientPool _clientPool;
private readonly IDigestCache _cache;
private readonly int _maxConcurrency;
public async Task<IReadOnlyDictionary<string, string>> ResolveAllAsync(
IReadOnlyList<ImageReference> images,
CancellationToken ct)
{
var results = new ConcurrentDictionary<string, string>();
// Check cache first
var uncached = new List<ImageReference>();
foreach (var image in images)
{
var cached = await _cache.GetAsync(image.FullReference, ct);
if (cached != null)
{
results[image.FullReference] = cached;
}
else
{
uncached.Add(image);
}
}
if (!uncached.Any())
{
return results.ToImmutableDictionary();
}
// Group by registry for connection reuse
var byRegistry = uncached.GroupBy(i => i.Registry);
await Parallel.ForEachAsync(
byRegistry,
new ParallelOptions { MaxDegreeOfParallelism = _maxConcurrency, CancellationToken = ct },
async (group, ct) =>
{
var client = await _clientPool.GetClientAsync(group.Key, ct);
try
{
// Batch resolve for this registry
var digests = await client.ResolveDigestsAsync(
group.Select(i => (i.Repository, i.Tag)).ToList(), ct);
foreach (var (image, digest) in group.Zip(digests))
{
results[image.FullReference] = digest;
await _cache.SetAsync(image.FullReference, digest, _cacheTtl, ct);
}
}
finally
{
_clientPool.ReturnClient(client);
}
});
return results.ToImmutableDictionary();
}
}
public interface IRegistryClient
{
// Single resolution
Task<string> ResolveDigestAsync(string repository, string tag, CancellationToken ct);
// Batch resolution (more efficient)
Task<IReadOnlyList<string>> ResolveDigestsAsync(
IReadOnlyList<(string Repository, string Tag)> images,
CancellationToken ct);
}
```
#### 3. TaskBatcher
Batches agent tasks for efficiency:
```csharp
public sealed class TaskBatcher
{
private readonly ConcurrentDictionary<Guid, TaskBatch> _batches = new();
private readonly TimeSpan _batchWindow;
private readonly int _maxBatchSize;
public async Task<Guid> EnqueueAsync(
AgentTask task,
CancellationToken ct)
{
var agentId = task.TargetAgentId;
// Get or create batch for this agent
var batch = _batches.GetOrAdd(agentId, _ => new TaskBatch
{
AgentId = agentId,
CreatedAt = _timeProvider.GetUtcNow(),
Tasks = new ConcurrentBag<AgentTask>()
});
batch.Tasks.Add(task);
// Check if batch should be sent
if (ShouldFlushBatch(batch))
{
await FlushBatchAsync(agentId, ct);
}
return batch.Id;
}
private bool ShouldFlushBatch(TaskBatch batch)
{
// Flush if max size reached
if (batch.Tasks.Count >= _maxBatchSize)
return true;
// Flush if batch window expired
if (_timeProvider.GetUtcNow() - batch.CreatedAt >= _batchWindow)
return true;
// Flush if high-priority task added
if (batch.Tasks.Any(t => t.Priority == TaskPriority.Immediate))
return true;
return false;
}
private async Task FlushBatchAsync(Guid agentId, CancellationToken ct)
{
if (!_batches.TryRemove(agentId, out var batch))
return;
var tasks = batch.Tasks.ToArray();
if (!tasks.Any())
return;
_logger.LogDebug(
"Flushing batch of {Count} tasks to agent {AgentId}",
tasks.Length, agentId);
// Group tasks by type for optimized execution
var grouped = tasks.GroupBy(t => t.TaskType);
foreach (var group in grouped)
{
var batchedPayload = CreateBatchedPayload(group.ToList());
await _agentClient.SendBatchAsync(agentId, batchedPayload, ct);
}
}
private BatchedTaskPayload CreateBatchedPayload(IReadOnlyList<AgentTask> tasks)
{
// Optimize payload based on task type
return tasks.First().TaskType switch
{
TaskType.Deploy => CreateDeployBatch(tasks),
TaskType.HealthCheck => CreateHealthCheckBatch(tasks),
TaskType.WriteSticker => CreateStickerBatch(tasks),
_ => CreateGenericBatch(tasks)
};
}
private BatchedTaskPayload CreateDeployBatch(IReadOnlyList<AgentTask> tasks)
{
// Deduplicate image pulls
var uniqueImages = tasks
.SelectMany(t => t.Payload.Images)
.Distinct()
.ToList();
return new BatchedTaskPayload
{
Type = BatchType.Deploy,
Images = uniqueImages, // Pull once, deploy many
Tasks = tasks.Select(t => new SlimTaskPayload
{
TaskId = t.Id,
ContainerName = t.Payload.ContainerName,
ImageIndex = uniqueImages.IndexOf(t.Payload.Image)
}).ToImmutableArray()
};
}
}
```
#### 4. CacheManager
Multi-level caching with intelligent invalidation:
```csharp
public sealed class CacheManager
{
private readonly IMemoryCache _l1Cache; // In-process
private readonly IDistributedCache _l2Cache; // Redis
private readonly ICacheInvalidator _invalidator;
public async Task<T?> GetOrSetAsync<T>(
string key,
Func<CancellationToken, Task<T>> factory,
CacheOptions options,
CancellationToken ct) where T : class
{
// L1 check
if (_l1Cache.TryGetValue(key, out T? l1Value))
{
_metrics.RecordHit("l1");
return l1Value;
}
// L2 check
var l2Value = await _l2Cache.GetAsync<T>(key, ct);
if (l2Value != null)
{
_metrics.RecordHit("l2");
// Populate L1
_l1Cache.Set(key, l2Value, new MemoryCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = options.L1Ttl,
Size = EstimateSize(l2Value)
});
return l2Value;
}
// Cache miss - compute value
_metrics.RecordMiss();
var value = await factory(ct);
if (value != null)
{
// Set L1
_l1Cache.Set(key, value, new MemoryCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = options.L1Ttl,
Size = EstimateSize(value)
});
// Set L2
await _l2Cache.SetAsync(key, value, new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = options.L2Ttl
}, ct);
// Register for invalidation
if (options.InvalidationTags != null)
{
await _invalidator.RegisterAsync(key, options.InvalidationTags, ct);
}
}
return value;
}
public async Task InvalidateByTagAsync(string tag, CancellationToken ct)
{
var keys = await _invalidator.GetKeysByTagAsync(tag, ct);
foreach (var key in keys)
{
_l1Cache.Remove(key);
await _l2Cache.RemoveAsync(key, ct);
}
await _invalidator.UnregisterTagAsync(tag, ct);
}
}
public sealed record CacheOptions
{
public TimeSpan L1Ttl { get; init; } = TimeSpan.FromMinutes(5);
public TimeSpan L2Ttl { get; init; } = TimeSpan.FromHours(1);
public ImmutableArray<string>? InvalidationTags { get; init; }
public bool AllowStale { get; init; }
}
```
#### 5. QueryOptimizer
Optimizes database queries:
```csharp
public sealed class QueryOptimizer
{
public async Task<IReadOnlyList<Release>> GetReleasesOptimizedAsync(
ReleaseQuery query,
CancellationToken ct)
{
// Build optimized query
var sql = new StringBuilder();
sql.AppendLine(@"
SELECT r.*,
c.name as component_name, c.digest as component_digest,
e.name as env_name, e.status as env_status
FROM releases r");
// Use indexed join strategy based on query
if (query.EnvironmentId.HasValue)
{
// Use environment index
sql.AppendLine(@"
INNER JOIN release_environments re ON r.id = re.release_id
AND re.environment_id = @EnvironmentId");
}
sql.AppendLine(@"
LEFT JOIN release_components c ON r.id = c.release_id
LEFT JOIN environments e ON r.current_environment_id = e.id
WHERE r.tenant_id = @TenantId");
// Apply filters with index hints
if (query.Status.HasValue)
{
sql.AppendLine("AND r.status = @Status"); // Uses idx_releases_status
}
if (query.CreatedAfter.HasValue)
{
sql.AppendLine("AND r.created_at >= @CreatedAfter"); // Uses idx_releases_created
}
// Optimized ordering
sql.AppendLine("ORDER BY r.created_at DESC");
// Pagination with keyset (faster than OFFSET)
if (query.Cursor != null)
{
sql.AppendLine("AND r.created_at < @CursorCreatedAt");
sql.AppendLine("AND r.id < @CursorId");
}
sql.AppendLine("LIMIT @Limit");
// Execute with read replica if available
var connection = query.AllowStale
? await _connectionPool.GetReadReplicaAsync(ct)
: await _connectionPool.GetPrimaryAsync(ct);
return await connection.QueryAsync<Release>(sql.ToString(), query, ct);
}
public void EnsureIndexes()
{
// Ensure critical indexes exist
var requiredIndexes = new[]
{
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_releases_tenant_status ON releases(tenant_id, status)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_releases_tenant_created ON releases(tenant_id, created_at DESC)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_releases_env ON releases(current_environment_id) WHERE current_environment_id IS NOT NULL",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_components_release ON release_components(release_id)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_deployments_release ON deployments(release_id, created_at DESC)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_promotions_release ON promotions(release_id, status)",
"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_evidence_subject ON evidence_packets(subject_id, subject_type)"
};
foreach (var index in requiredIndexes)
{
_migrationRunner.EnsureIndex(index);
}
}
}
```
#### 6. Prefetcher
Proactively loads data:
```csharp
public sealed class Prefetcher
{
public async Task PrefetchForPromotionAsync(
Guid releaseId,
Guid targetEnvironmentId,
CancellationToken ct)
{
// Prefetch in parallel
var tasks = new List<Task>
{
// Release and components
_releaseCache.WarmAsync(releaseId, ct),
// Target environment
_environmentCache.WarmAsync(targetEnvironmentId, ct),
// Gates for this environment
_gateCache.WarmForEnvironmentAsync(targetEnvironmentId, ct),
// Recent scan results
_scanCache.WarmForReleaseAsync(releaseId, ct),
// Approval policies
_policyCache.WarmForEnvironmentAsync(targetEnvironmentId, ct),
// Available agents
_agentCache.WarmForEnvironmentAsync(targetEnvironmentId, ct)
};
await Task.WhenAll(tasks);
}
public async Task PrefetchForDashboardAsync(
Guid tenantId,
CancellationToken ct)
{
// Predictive prefetch based on user behavior
var recentQueries = await _queryHistoryStore.GetRecentAsync(tenantId, ct);
var predictedQueries = _predictor.Predict(recentQueries);
foreach (var query in predictedQueries.Take(10))
{
_ = ExecuteAndCacheAsync(query, ct); // Fire and forget
}
}
}
```
#### 7. ConnectionPool
Optimized connection management:
```csharp
public sealed class ConnectionPool
{
private readonly ObjectPool<NpgsqlConnection> _primaryPool;
private readonly ObjectPool<NpgsqlConnection> _replicaPool;
private readonly ILoadBalancer _replicaBalancer;
public async Task<PooledConnection> GetPrimaryAsync(CancellationToken ct)
{
var connection = _primaryPool.Get();
if (connection.State != ConnectionState.Open)
{
await connection.OpenAsync(ct);
}
return new PooledConnection(connection, () => _primaryPool.Return(connection));
}
public async Task<PooledConnection> GetReadReplicaAsync(CancellationToken ct)
{
// Select replica based on load
var replica = _replicaBalancer.SelectReplica();
var connection = _replicaPool.Get();
connection.ConnectionString = replica.ConnectionString;
if (connection.State != ConnectionState.Open)
{
await connection.OpenAsync(ct);
}
return new PooledConnection(connection, () => _replicaPool.Return(connection));
}
public void WarmPool()
{
// Pre-create connections
Parallel.For(0, _config.MinPoolSize, _ =>
{
var connection = new NpgsqlConnection(_config.ConnectionString);
connection.Open();
_primaryPool.Return(connection);
});
}
}
public sealed class PooledConnection : IAsyncDisposable
{
private readonly NpgsqlConnection _connection;
private readonly Action _returnAction;
public PooledConnection(NpgsqlConnection connection, Action returnAction)
{
_connection = connection;
_returnAction = returnAction;
}
public NpgsqlConnection Connection => _connection;
public async ValueTask DisposeAsync()
{
_returnAction();
}
}
```
---
## Performance Benchmarks
### Target Metrics
| Operation | Current | Target | Optimization |
|-----------|---------|--------|--------------|
| Gate evaluation (5 gates) | 5s (sequential) | 1.5s (parallel) | ParallelGateEvaluator |
| Digest resolution (10 images) | 10s | 2s | BulkDigestResolver |
| Promotion creation | 500ms | 100ms | Prefetching |
| Dashboard load | 2s | 500ms | Caching + Query optimization |
| Deployment start | 3s | 500ms | Task batching |
| Agent task throughput | 100/s | 1000/s | Connection pooling |
### Load Test Scenarios
```csharp
public sealed class PerformanceTests
{
[Fact]
public async Task Gate_Evaluation_Should_Complete_Under_Target()
{
// Arrange
var gates = CreateGates(count: 10);
var context = CreatePromotionContext();
// Act
var sw = Stopwatch.StartNew();
var result = await _evaluator.EvaluateAllAsync(context, gates, CancellationToken.None);
sw.Stop();
// Assert
Assert.True(sw.Elapsed < TimeSpan.FromSeconds(2));
Assert.Equal(GateEvaluationStatus.Succeeded, result.Status);
}
[Fact]
public async Task Concurrent_Promotions_Should_Scale_Linearly()
{
// Test with 1, 10, 50, 100 concurrent promotions
var results = new List<(int Count, TimeSpan Duration)>();
foreach (var count in new[] { 1, 10, 50, 100 })
{
var promotions = Enumerable.Range(0, count)
.Select(_ => CreatePromotionRequest())
.ToList();
var sw = Stopwatch.StartNew();
await Task.WhenAll(promotions.Select(p =>
_promotionService.CreateAsync(p, CancellationToken.None)));
sw.Stop();
results.Add((count, sw.Elapsed));
}
// Assert linear scaling (within 2x factor)
var baseline = results[0].Duration.TotalMilliseconds;
foreach (var (count, duration) in results.Skip(1))
{
var expectedMax = baseline * count * 2;
Assert.True(duration.TotalMilliseconds < expectedMax,
$"Count {count}: {duration.TotalMilliseconds}ms exceeded {expectedMax}ms");
}
}
}
```
---
## Configuration
### Performance Tuning Options
```yaml
performance:
# Gate evaluation
gates:
max_concurrent_evaluations: 10
evaluation_timeout: "00:00:30"
cache_ttl: "00:05:00"
# Digest resolution
digest_resolution:
max_concurrent_registries: 5
max_concurrent_per_registry: 10
cache_ttl: "01:00:00"
timeout: "00:00:30"
# Task batching
task_batching:
enabled: true
batch_window: "00:00:01"
max_batch_size: 50
# Caching
cache:
l1:
enabled: true
max_size_mb: 256
default_ttl: "00:05:00"
l2:
enabled: true
provider: redis
connection_string: "redis://localhost:6379"
default_ttl: "01:00:00"
# Database
database:
primary:
min_pool_size: 10
max_pool_size: 100
connection_timeout: "00:00:05"
read_replicas:
enabled: true
hosts:
- host: replica1.db.local
weight: 50
- host: replica2.db.local
weight: 50
load_balancing: round_robin
# Prefetching
prefetch:
enabled: true
promotion_warmup: true
dashboard_prediction: true
prediction_depth: 10
# Connection pooling
http_client:
max_connections_per_host: 100
connection_lifetime: "00:05:00"
keep_alive_timeout: "00:00:30"
# gRPC
grpc:
max_concurrent_streams: 100
keepalive_time: "00:01:00"
keepalive_timeout: "00:00:20"
```
---
## Metrics & Observability
### Prometheus Metrics
```
# Latency histograms
stella_gate_evaluation_duration_seconds{gate_type}
stella_digest_resolution_duration_seconds{registry}
stella_promotion_creation_duration_seconds
stella_deployment_start_duration_seconds
# Cache metrics
stella_cache_hits_total{level, cache}
stella_cache_misses_total{cache}
stella_cache_size_bytes{level, cache}
stella_cache_evictions_total{cache, reason}
# Connection pools
stella_connection_pool_size{pool}
stella_connection_pool_active{pool}
stella_connection_pool_wait_seconds{pool}
# Batching
stella_batch_size{operation}
stella_batch_flush_total{operation, reason}
stella_batch_latency_seconds{operation}
# Query performance
stella_query_duration_seconds{query_type}
stella_query_rows_returned{query_type}
stella_index_scan_total{table, index}
# Throughput
stella_operations_per_second{operation}
stella_concurrent_operations{operation}
```
---
## API Design
### Performance-Optimized Endpoints
```
# Batch operations
POST /api/v1/batch/digests # Bulk digest resolution
POST /api/v1/batch/releases # Bulk release creation
POST /api/v1/batch/gates # Parallel gate evaluation
# Prefetch hints
POST /api/v1/prefetch/promotion # Warm cache for promotion
POST /api/v1/prefetch/dashboard # Warm cache for dashboard
# Cache management
DELETE /api/v1/cache/invalidate # Invalidate cache entries
GET /api/v1/cache/stats # Cache statistics
# Health & metrics
GET /api/v1/performance/stats # Performance statistics
GET /api/v1/performance/slow-queries # Recent slow queries
```
---
## Test Strategy
### Unit Tests
- Parallel evaluation logic
- Batch sizing algorithms
- Cache key generation
- Query optimization rules
### Integration Tests
- Full parallel gate flow
- Cache hit/miss scenarios
- Connection pool behavior
- Batch flush triggers
### Performance Tests
- Load testing with concurrent users
- Throughput benchmarks
- Latency percentiles
- Memory usage under load
### Chaos Tests
- Cache failure scenarios
- Database failover
- Connection pool exhaustion
---
## Migration Path
### Phase 1: Measurement (Week 1)
- Add performance metrics
- Establish baselines
- Identify bottlenecks
### Phase 2: Parallel Gates (Week 2-3)
- ParallelGateEvaluator
- Execution plan builder
- Gate result caching
### Phase 3: Bulk Operations (Week 4-5)
- BulkDigestResolver
- Task batching
- Batch optimization
### Phase 4: Caching (Week 6-7)
- Multi-level cache
- Cache invalidation
- Prefetching
### Phase 5: Database (Week 8-9)
- Query optimization
- Index tuning
- Connection pooling
- Read replicas
### Phase 6: Tuning (Week 10)
- Load testing
- Parameter tuning
- Documentation

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,66 +0,0 @@
# Stella Ops OnPrem Offer
_Self-hosted release governance + reachability-aware security gating for nonKubernetes containers. All features included. Pay only for environments and new artifacts analyzed._
## Stella Ops Suite (Orchestrator + Scanner) — self-hosted
| Tier | Monthly | Annual | Environments | New digests deep-scanned / month | Deployment Targets / Features Limits| Support |
| ------------ | ---------: | -----------: | -----------: | -------------------------------: | ------------: | --------------------------------------------------------------------------------------------------- |
| **Free** | - | - | 3 | 1,000 | **No limits** | community forum, self service doctor utils |
| **Plus** | **$199** | **$2,189** | **10** | **10,000** | **No limits** | Same as free |
| **Pro** | **$599** | **$6,589** | **100** | **100,000** | **No limits** | Maintainer-reviewed community forum; typical response ~3 business days. 10 tickets a month |
| **Business** | **$2,999** | **$32,989** | **1,000** | **1,000,000** | **No limits** | Email support, **24h** response window, 20 tickets a month, **fair use** on mirroring/audit confirmations |
| Add-on | Price | Notes |
| ---------------------- | -------: | ----------------------------------------------------------------- |
| **+10 support tickets** | **$249** | Intended for bursts, incidents, or upgrade-less support expansion |
| **+10,000 new digest deep scans** | **$249** | Burst capacity; intentionally premium pricing |
---
## What every tier includes
All tiers (including Free) include the full Stella Ops capability set:
* **Release orchestration (nonK8s containers)**: environments, promotions, approvals, rollbacks, templates, step graph (sequential/parallel), UI visualization, per-step logs.
* **Deployment execution**: Docker Compose / scripted targets; immutable generated deployment artifacts; “version sticker” written to deployment directory.
* **Security gating**: scan-on-build, gate-on-release, re-evaluation on vuln intel updates.
* **Reachability + hybrid reachability**: reduced-noise vulnerability prioritization (reachability-aware signal).
* **Attestability / verity**: evidence packets, integrity records, exportable audit trail, deterministic decision records.
* **Plugins**: SCM/CI/registry/vault/agent providers and plugin-specific steps (extensible).
* **Onprem operation**: you run it; your compute; your data; offline/air-gapped friendly.
* **Unlimited targets:** no license cap; fair use may apply to abusive automation patterns.
Only the following are tier-limited:
* **Environment:** dev/stage/prod-like boundary with its own policy and targets.
* **New digest deep scans per month** (“deep scan” = new OCI digest analysis producing SBOM + reachability evidence + verdict). First time Stella analyzes an OCI digest to produce SBOM + reachability evidence. **Re-evaluation:** policy/vulnerability recomputation on CVE updates using stored evidence (does not consume deep scans).
---
# Scanner-only and Orchestrator-only offers
You also proposed separate product pricing with the same “all features included” principle.
## 1) Stella Scanner (onprem)
**Annual option:** 1 month free (pay 11 months)
| Tier | Monthly | Annual | New digests deep-scanned / month | Support |
| -------------------- | ---------: | ----------: | -----------------------------------------------------------------------: | ----------------------------------------- |
| **Scanner Plus** | **$159** | **$1,749** | (recommend aligning to Suite Plus) **10,000** | community only |
| **Scanner Pro** | **$399** | **$4,389** | (align to Suite Pro) **100,000** | community forum (~3 business days target) |
| **Scanner Business** | **$1,999** | **$21,989** | (align to Suite Business or a smaller “security business”) **1,000,000** | email support (24h window) + fair use |
## 2) Stella Orchestrator (onprem)
**Annual option:** 1 month free (pay 11 months)
| Tier | Monthly | Annual | Environments | Targets | Support |
| ------------------------- | ---------: | ----------: | -----------: | ------------: | ----------------------------------------- |
| **Orchestrator Plus** | **$100** | **$1,100** | **10** | **Unlimited** | community only |
| **Orchestrator Pro** | **$299** | **$3,289** | **100** | **Unlimited** | community forum (~3 business days target) |
| **Orchestrator Business** | **$1,599** | **$17,589** | **1,000** | **Unlimited** | email support (24h) + fair use |