release orchestrator pivot, architecture and planning

This commit is contained in:
2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions

View File

@@ -1,91 +1,186 @@
# Key Features Capability Cards
# Key Features Capability Cards
> **Core Thesis:** Stella Ops isn't a scanner that outputs findings. It's a platform that outputs **attestable decisions that can be replayed**. That difference survives auditors, regulators, and supply-chain propagation.
> **Core Thesis:** Stella Ops Suite isn't a scanner or a deployment tool—it's a **release control plane** that produces **attestable decisions that can be replayed**. Security is a gate, not a blocker. Evidence survives auditors, regulators, and supply-chain propagation.
> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list of all platform capabilities, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability.
> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability.
---
## At a Glance
| What Competitors Do | What Stella Ops Does |
|--------------------|---------------------|
| Output findings | Output decisions with proof chains |
| What Competitors Do | What Stella Ops Suite Does |
|--------------------|---------------------------|
| CI/CD runs pipelines | Central release authority across environments |
| Deployment tools promote | Promotion with integrated security gates |
| Scanners output findings | Security gates output decisions with proof chains |
| VEX as suppression file | VEX as logical claim system (K4 lattice) |
| Reachability as badge | Reachability as signed proof |
| "+3 CVEs" reports | "Exploitability dropped 41%" semantic deltas |
| Hide unknowns | Surface and score unknowns |
| Release identity via tags | Release identity via immutable digests |
| Per-seat/per-project pricing | Pay for environments + new digests/day |
| Online-first | Offline-first with full parity |
---
Each card below pairs the headline capability with the evidence that backs it and why it matters day to day.
## Release Orchestration (Planned)
## 0. Decision Capsules — Audit-Grade Evidence Bundles
### 0. Release Control Plane
**The core moat capability.** Every scan result is sealed in a **Decision Capsule**—a content-addressed bundle containing everything needed to reproduce and verify the vulnerability decision.
**The new core capability.** Stella Ops Suite becomes the central release authority between CI and runtime targets.
| Capability | What It Does |
|------------|--------------|
| **Environment management** | Define Dev/Stage/Prod with freeze windows and approval policies |
| **Release bundles** | Compose releases from component OCI digests with semantic versioning |
| **Promotion workflows** | DAG-based workflow engine with approvals, gates, and hooks |
| **Security gates** | Scan on build, evaluate on release, re-evaluate on CVE updates |
| **Deployment execution** | Deploy to Docker/Compose/ECS/Nomad via agents or agentless |
| **Evidence packets** | Every release decision is cryptographically signed and stored |
**Why it matters:** Non-Kubernetes container teams finally get a central release authority with audit-grade evidence—without replacing their existing CI/SCM/registry stack.
### 1. Digest-First Release Identity
**Tags are mutable; digests are truth.** A release is an immutable set of OCI digests, resolved at release creation time.
```
Release: myapp-v2.3.1
Components:
api: sha256:abc123...
worker: sha256:def456...
frontend: sha256:789ghi...
```
**What this enables:**
- Tamper detection at pull time (digest mismatch = deployment failure)
- Audit trail of exactly what was deployed
- Rollback to known-good digests, not "latest" tags
- "What is deployed where" tracking with integrity
**Modules (planned):** `ReleaseManager`, `ComponentRegistry`, `VersionManager`
### 2. Promotion Workflows with Security Gates
**Security integrated into release flow, not bolted on.** Promotion requests trigger gate evaluation before deployment.
| Gate Type | What It Checks |
|-----------|---------------|
| **Security gate** | Reachable critical/high vulnerabilities |
| **Approval gate** | Required approval count, separation of duties |
| **Freeze window gate** | Environment freeze windows |
| **Policy gate** | Custom OPA/Rego policies |
| **Previous environment gate** | Release deployed to prior environment |
**Decision records include:**
- All gate results with pass/fail reasons
- Evidence refs (scan verdicts, approval records)
- Policy hash + inputs hash for replay
- "Why blocked?" explainability
**Modules (planned):** `PromotionManager`, `ApprovalGateway`, `DecisionEngine`
### 3. Deployment Execution
**Deploy to non-Kubernetes targets as first-class citizens.** Agent-based or agentless deployment to Docker hosts, Compose, ECS, Nomad.
| Target Type | Deployment Method |
|-------------|-------------------|
| **Docker host** | Agent pulls and starts containers |
| **Compose host** | Agent writes `compose.stella.lock.yml` and runs `docker-compose up` |
| **ECS service** | Agent updates task definition and service |
| **Nomad job** | Agent updates job spec and submits |
| **SSH remote** | Agentless via SSH (Linux) |
| **WinRM remote** | Agentless via WinRM (Windows) |
**Generated artifacts:**
- `compose.stella.lock.yml`: Pinned digests, resolved environment refs
- `stella.version.json`: Version sticker on target for drift detection
- `release.evidence.json`: Decision record
**Modules (planned):** `DeployOrchestrator`, `Agent.*`, `ArtifactGenerator`
### 4. Progressive Delivery
**A/B releases and canary deployments.** Gradual rollout with automatic rollback on health failure.
| Strategy | Description |
|----------|-------------|
| **Immediate** | 0% → 100% instantly |
| **Canary** | 10% → 25% → 50% → 100% with health checks |
| **Blue-green** | Deploy to B, switch traffic, retire A |
| **Rolling** | 10% at a time with health checks |
**Traffic routing plugins:** Nginx, HAProxy, Traefik, AWS ALB
**Modules (planned):** `ABManager`, `TrafficRouter`, `CanaryController`
### 5. Plugin System (Three-Surface Model)
**Extensible without core code changes.** Plugins contribute through three surfaces.
| Surface | What It Does |
|---------|--------------|
| **Manifest** | Declares what the plugin provides (integrations, steps, agents) |
| **Connector runtime** | gRPC interface for runtime operations |
| **Step provider** | Execution characteristics for workflow steps |
**Plugin types:**
- **Integration connectors:** SCM (GitHub, GitLab), CI (Actions, Jenkins), Registry (Harbor, ECR), Vault (HashiCorp, AWS Secrets)
- **Step providers:** Custom workflow steps
- **Agent types:** New deployment targets
- **Gate providers:** Custom gate evaluations
**Modules (planned):** `PluginRegistry`, `PluginLoader`, `PluginSandbox`, `PluginSDK`
---
## Security Capabilities (Operational)
### 6. Decision Capsules — Audit-Grade Evidence Bundles
**Every scan and release decision is sealed.** A Decision Capsule is a content-addressed bundle containing everything needed to reproduce and verify the decision.
| Component | What's Included |
|-----------|----------------|
| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version, lattice rules |
| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version |
| **Evidence** | Reachability proofs (static + runtime), VEX statements, binary fingerprints |
| **Outputs** | Verdicts, risk scores, remediation paths |
| **Signatures** | DSSE envelopes over all of the above |
**Why it matters:** Six months from now, an auditor can run `stella replay srm.yaml --assert-digest <sha>` and get *identical* results. This is what "audit-grade assurance" actually means.
**Why it matters:** Auditors can replay any decision bit-for-bit. This is what "audit-grade assurance" actually means.
**No competitor offers this.** Trivy, Grype, Snyk—none can replay a past scan bit-for-bit because they don't freeze feeds or produce deterministic manifests.
**Modules:** `EvidenceLocker`, `Attestor`, `Replay`
## 1. Delta SBOM Engine
### 7. Lattice Policy + OpenVEX (K4 Logic)
**Performance without sacrificing determinism.** Layer-aware ingestion keeps the SBOM catalog content-addressed; rescans only fetch new layers.
**VEX as a logical claim system, not a suppression file.** The policy engine uses Belnap K4 four-valued logic.
- **Speed:** Warm scans < 1 second; CI/CD pipelines stay fast
- **Determinism:** Replay Manifest (SRM) captures exact analyzer inputs/outputs per layer
- **Evidence:** Binary crosswalk via Build-ID mapping; `bin:{sha256}` fallbacks for stripped binaries
| State | Meaning |
|-------|---------|
| **Unknown (bottom)** | No information |
| **True** | Positive assertion |
| **False** | Negative assertion |
| **Conflict (top)** | Contradictory assertions |
**Modules:** `Scanner`, `SbomService`, `BinaryIndex`
**Why it matters:** When vendor says "not_affected" but runtime shows the function was called, you have a *conflict*—not a false positive.
---
**Modules:** `VexLens`, `TrustLatticeEngine`, `Policy`
## 2. Lattice Policy + OpenVEX (K4 Logic)
### 8. Signed Reachability Proofs
**VEX as a logical claim system, not a suppression file.** The policy engine uses **Belnap K4 four-valued logic** (Unknown, True, False, Conflict) to merge SBOM, advisories, VEX, and waivers.
**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE.
| What Competitors Do | What Stella Does |
|--------------------|------------------|
| VEX filters findings (boolean) | VEX is logical claims with trust weighting |
| Conflicts hidden | Conflicts are explicit state () |
| "Vendor says not_affected" = done | Vendor + runtime + reachability merged; conflicts surfaced |
| Unknown = assume safe | Unknown = first-class state with risk implications |
| Layer | What It Proves |
|-------|---------------|
| **Static** | Call graph shows path from entrypoint → vulnerable function |
| **Binary** | Compiled binary contains the symbol |
| **Runtime** | Process actually executed the code path |
**Why it matters:** When vendor says "not_affected" but your runtime shows the function was called, you have a *conflict*not a false positive. The lattice preserves this for policy resolution.
**Why it matters:** "Here's the exact call path" vs "potentially reachable." Signed, not claimed.
**Modules:** `VexLens`, `TrustLatticeEngine`, `Excititor` (110+ tests passing)
**Modules:** `ReachGraph`, `PathWitnessBuilder`
---
### 9. Deterministic Replay
## 3. Sovereign Crypto Profiles
**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC (post-quantum) profiles are configuration toggles, not recompiles.
| Profile | Algorithms | Use Case |
|---------|-----------|----------|
| **FIPS-140-3** | ECDSA P-256, RSA-PSS | US federal requirements |
| **eIDAS** | ETSI TS 119 312 | EU qualified signatures |
| **GOST-2012** | GOST R 34.10-2012 | Russian Federation |
| **SM2** | GM/T 0003.2-2012 | People's Republic of China |
| **PQC** | Dilithium, Falcon | Post-quantum readiness |
**Why it matters:** Multi-signature DSSE envelopes (sign with FIPS *and* GOST) for cross-jurisdiction compliance. No competitor offers this.
**Modules:** `Cryptography`, `CryptoProfile`, `RootPack`
---
## 4. Deterministic Replay
**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed with `stella replay srm.yaml`.
**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed.
```bash
# Six months later, prove what you knew
@@ -93,212 +188,62 @@ stella replay srm.yaml --assert-digest sha256:abc123...
# Output: PASS - identical result
```
**What's frozen:**
- Feed snapshots (NVD, KEV, EPSS, distro advisories) with content hashes
- Analyzer versions and configs
- Policy rules and lattice state
- Random seeds for deterministic ordering
**What's frozen:** Feed snapshots, analyzer versions, policy rules, random seeds.
**Why it matters:** This is what "audit-grade" actually means. Not "we logged it" but "you can re-run it."
**Modules:** `Replay`, `Scanner`, `Policy`
### 10. Sovereign Crypto Profiles
**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC profiles are configuration toggles.
| Profile | Use Case |
|---------|----------|
| **FIPS-140-3** | US federal |
| **eIDAS** | EU qualified signatures |
| **GOST-2012** | Russian Federation |
| **SM2** | People's Republic of China |
| **PQC** | Post-quantum readiness |
**Modules:** `Cryptography`, `CryptoProfile`
### 11. Offline Operations (Air-Gap Parity)
**Full functionality without network.** Offline Update Kits bundle everything needed.
| Component | Offline Method |
|-----------|----------------|
| Feed updates | Sealed bundle with Merkle roots |
| Crypto verification | Embedded revocation lists |
| Transparency logging | Local transparency mirror |
**Modules:** `AirGap.Controller`, `TrustStore`
---
## 5. Offline Operations (Air-Gap Parity)
## Competitive Moats Summary
**Full functionality without network.** Offline Update Kits bundle everything needed for air-gapped operation.
**Six capabilities no competitor offers together:**
| Component | Online | Offline |
|-----------|--------|---------|
| Feed updates | Live | Sealed bundle with Merkle roots |
| Crypto verification | OCSP/CRL | Embedded revocation lists |
| Transparency logging | Rekor | Local transparency mirror |
| Trust roots | Live TSL | RootPack bundles |
| # | Capability | Category |
|---|-----------|----------|
| 1 | **Non-Kubernetes Specialization** | Release orchestration |
| 2 | **Digest-First Release Identity** | Release orchestration |
| 3 | **Security Gates in Promotion Flow** | Release orchestration |
| 4 | **Signed Reachability Proofs** | Security |
| 5 | **Deterministic Replay** | Security |
| 6 | **Sovereign + Offline Operation** | Operations |
**Why it matters:** Air-gapped environments get *identical* results to connected, not degraded. Competitors offer partial offline (cached feeds) but not epistemic parity (sealed, reproducible knowledge state).
**Modules:** `AirGap.Controller`, `TrustStore`, `EgressPolicy`
---
## 6. Signed Reachability Proofs
**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE; optional edge-bundle attestations for contested paths.
| Layer | What It Proves | Attestation |
|-------|---------------|-------------|
| **Static** | Call graph says function is reachable | Graph-level DSSE |
| **Binary** | Compiled binary contains the symbol | Build-ID mapping |
| **Runtime** | Process actually executed the code path | Edge-bundle DSSE (optional) |
**Why it matters:** Not "potentially reachable" but "here's the exact call path from `main()` to `vulnerable_function()`." You can quarantine or dispute individual edges, not just all-or-nothing.
**No competitor signs reachability graphs.** They claim reachability; we *prove* it.
**Modules:** `ReachGraph`, `PathWitnessBuilder`, `CompositeGateDetector`
---
## 7. Semantic Smart-Diff
**Diff security meaning, not CVE counts.** Compare reachability graphs, policy outcomes, and trust weights between releases.
```
Before: 5 critical CVEs (3 reachable)
After: 7 critical CVEs (1 reachable)
Smart-Diff output: "Exploitability DECREASED by 67% despite +2 CVEs"
```
**What's compared:**
- Reachability graph deltas
- VEX state changes
- Policy outcome changes
- Trust weight shifts
**Why it matters:** "+3 CVEs" tells you nothing. "Reachable attack surface dropped by half" tells you everything.
**Modules:** `MaterialRiskChangeDetector`, `RiskStateSnapshot`, `Scanner.ReachabilityDrift`
---
## 8. Unknowns as First-Class State
**Uncertainty is risk—we surface and score it.** Explicit modeling of what we *don't* know, with policy implications.
| Band | Meaning | Policy Action |
|------|---------|---------------|
| **HOT** | High uncertainty + exploit pressure | Immediate investigation |
| **WARM** | Moderate uncertainty | Scheduled review |
| **COLD** | Low uncertainty | Decay toward resolution |
| **RESOLVED** | Uncertainty eliminated | No action |
**Why it matters:** Competitors hide unknowns (assume safe). We track them with decay algorithms, blast-radius containment, and policy budgets ("fail if unknowns > N").
**Modules:** `UnknownStateLedger`, `Policy`, `Signals`
---
## 9. Three-Layer Reachability Proofs
**Structural false positive elimination.** All three layers must align for exploitability to be confirmed.
```
Layer 1 (Static): Call graph shows path from entrypoint → vulnerable function
Layer 2 (Binary): Compiled binary contains the symbol with matching offset
Layer 3 (Runtime): eBPF probe confirms function was actually executed
```
**Confidence tiers:**
- **Confirmed** — All three layers agree
- **Likely** — Static + binary agree; no runtime data
- **Present** — Package present; no reachability evidence
- **Unreachable** — Static analysis proves no path exists
**Why it matters:** False positives become *structurally impossible*, not heuristically reduced.
**Modules:** `Scanner.VulnSurfaces`, `PathWitnessBuilder`
---
## 10. Competitive Moats Summary
**Four capabilities no competitor offers together:**
| # | Capability | Why It's Hard to Copy |
|---|-----------|----------------------|
| 1 | **Signed Reachability** | Requires three-layer instrumentation + cryptographic binding |
| 2 | **Deterministic Replay** | Requires content-addressed evidence + feed snapshotting |
| 3 | **K4 Lattice VEX** | Requires rethinking VEX from suppression to claims |
| 4 | **Sovereign Offline** | Requires pluggable crypto + offline trust roots |
**Reference:** `docs/product/competitive-landscape.md`, `docs/product/moat-strategy-summary.md`
---
## 11. Trust Algebra Engine (K4 Lattice)
**Formal conflict resolution, not naive precedence.** The lattice engine uses Belnap K4 four-valued logic to aggregate heterogeneous security assertions.
| State | Meaning | Example |
|-------|---------|---------|
| **Unknown (⊥)** | No information | New package, no VEX yet |
| **True (T)** | Positive assertion | "This CVE affects this package" |
| **False (F)** | Negative assertion | "This CVE does not affect this package" |
| **Conflict ()** | Contradictory assertions | Vendor says not_affected; runtime says called |
**Security Atoms (six orthogonal propositions):**
- PRESENT, APPLIES, REACHABLE, MITIGATED, FIXED, MISATTRIBUTED
**Why it matters:** Unlike naive precedence (vendor > distro > scanner), we:
- Preserve conflicts as explicit state, not hidden
- Track critical unknowns separately from ancillary ones
- Produce deterministic, explainable dispositions
**Modules:** `TrustLatticeEngine`, `Policy` (110+ tests passing)
---
## 12. Deterministic Task Packs
**Auditable automation.** TaskRunner executes declarative Task Packs with plan-hash binding, approvals, and DSSE evidence bundles.
- **Plan-hash binding:** Task pack execution is tied to specific plan versions
- **Approval gates:** Human sign-off required before execution
- **Sealed mode:** Air-gap compatible execution
- **Evidence bundles:** DSSE-signed results for audit trails
**Why it matters:** Same workflows online or offline, with provable provenance.
**Reference:** `docs/modules/packs-registry/guides/spec.md`, `docs/modules/taskrunner/architecture.md`
---
## 13. Evidence-Grade Testing
**Determinism as a continuous guarantee.** CI lanes that make reproducibility continuously provable.
| Test Type | What It Proves |
|----------|---------------|
| **Determinism tests** | Same inputs → same outputs |
| **Offline parity tests** | Air-gapped = connected results |
| **Contract stability tests** | APIs don't break |
| **Golden fixture tests** | Historical scans still replay |
**Why it matters:** Regression-proof audits. Evidence, not assumptions, drives releases.
**Reference:** `docs/technical/testing/testing-strategy-models.md`, `docs/TEST_SUITE_OVERVIEW.md`
**Pricing moat:** No per-seat, per-project, or per-deployment tax. Limits are environments + new digests/day.
---
## Quick Reference
### Key Commands
```bash
# Determinism proof
stella scan --image <img> --srm-out a.yaml
stella scan --image <img> --srm-out b.yaml
diff a.yaml b.yaml # Identical
# Replay proof
stella replay srm.yaml --assert-digest <sha>
# Reachability proof
stella graph show --cve CVE-XXXX-YYYY --artifact <digest>
# VEX evaluation
stella vex evaluate --artifact <digest>
# Offline scan
stella rootpack import bundle.tar.gz
stella scan --offline --image <digest>
```
### Key Documents
- **Competitive Landscape**: `docs/product/competitive-landscape.md`
- **Moat Strategy**: `docs/product/moat-strategy-summary.md`
- **Proof Architecture**: `docs/modules/platform/proof-driven-moats-architecture.md`
- **Vision**: `docs/VISION.md`
- **Architecture Overview**: `docs/ARCHITECTURE_OVERVIEW.md`
- **Quickstart**: `docs/quickstart.md`
- **Product Vision**: [`docs/product/VISION.md`](product/VISION.md)
- **Architecture Overview**: [`docs/ARCHITECTURE_OVERVIEW.md`](ARCHITECTURE_OVERVIEW.md)
- **Release Orchestrator Architecture**: [`docs/modules/release-orchestrator/architecture.md`](modules/release-orchestrator/architecture.md)
- **Competitive Landscape**: [`docs/product/competitive-landscape.md`](product/competitive-landscape.md)
- **Quickstart**: [`docs/quickstart.md`](quickstart.md)
- **Feature Matrix**: [`docs/FEATURE_MATRIX.md`](FEATURE_MATRIX.md)