release orchestrator pivot, architecture and planning

2026-01-10 22:37:22 +02:00
parent c84f421e2f
commit d509c44411
130 changed files with 70292 additions and 721 deletions
--- a/docs/key-features.md
+++ b/docs/key-features.md
@@ -1,91 +1,186 @@
-# Key Features – Capability Cards
+# Key Features — Capability Cards

-> **Core Thesis:** Stella Ops isn't a scanner that outputs findings. It's a platform that outputs **attestable decisions that can be replayed**. That difference survives auditors, regulators, and supply-chain propagation.
+> **Core Thesis:** Stella Ops Suite isn't a scanner or a deployment tool—it's a **release control plane** that produces **attestable decisions that can be replayed**. Security is a gate, not a blocker. Evidence survives auditors, regulators, and supply-chain propagation.

-> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list of all platform capabilities, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability.
+> **Looking for the complete feature catalog?** See [`full-features-list.md`](full-features-list.md) for the comprehensive list, or [`FEATURE_MATRIX.md`](FEATURE_MATRIX.md) for tier-by-tier availability.

 ---

 ## At a Glance

-| What Competitors Do | What Stella Ops Does |
-|--------------------|---------------------|
-| Output findings | Output decisions with proof chains |
+| What Competitors Do | What Stella Ops Suite Does |
+|--------------------|---------------------------|
+| CI/CD runs pipelines | Central release authority across environments |
+| Deployment tools promote | Promotion with integrated security gates |
+| Scanners output findings | Security gates output decisions with proof chains |
 | VEX as suppression file | VEX as logical claim system (K4 lattice) |
-| Reachability as badge | Reachability as signed proof |
-| "+3 CVEs" reports | "Exploitability dropped 41%" semantic deltas |
-| Hide unknowns | Surface and score unknowns |
+| Release identity via tags | Release identity via immutable digests |
+| Per-seat/per-project pricing | Pay for environments + new digests/day |
 | Online-first | Offline-first with full parity |

 ---

-Each card below pairs the headline capability with the evidence that backs it and why it matters day to day.
+## Release Orchestration (Planned)

-## 0. Decision Capsules — Audit-Grade Evidence Bundles
+### 0. Release Control Plane

-**The core moat capability.** Every scan result is sealed in a **Decision Capsule**—a content-addressed bundle containing everything needed to reproduce and verify the vulnerability decision.
+**The new core capability.** Stella Ops Suite becomes the central release authority between CI and runtime targets.
+
+| Capability | What It Does |
+|------------|--------------|
+| **Environment management** | Define Dev/Stage/Prod with freeze windows and approval policies |
+| **Release bundles** | Compose releases from component OCI digests with semantic versioning |
+| **Promotion workflows** | DAG-based workflow engine with approvals, gates, and hooks |
+| **Security gates** | Scan on build, evaluate on release, re-evaluate on CVE updates |
+| **Deployment execution** | Deploy to Docker/Compose/ECS/Nomad via agents or agentless |
+| **Evidence packets** | Every release decision is cryptographically signed and stored |
+
+**Why it matters:** Non-Kubernetes container teams finally get a central release authority with audit-grade evidence—without replacing their existing CI/SCM/registry stack.
+
+### 1. Digest-First Release Identity
+
+**Tags are mutable; digests are truth.** A release is an immutable set of OCI digests, resolved at release creation time.
+
+```
+Release: myapp-v2.3.1
+Components:
+  api:      sha256:abc123...
+  worker:   sha256:def456...
+  frontend: sha256:789ghi...
+```
+
+**What this enables:**
+- Tamper detection at pull time (digest mismatch = deployment failure)
+- Audit trail of exactly what was deployed
+- Rollback to known-good digests, not "latest" tags
+- "What is deployed where" tracking with integrity
+
+**Modules (planned):** `ReleaseManager`, `ComponentRegistry`, `VersionManager`
+
+### 2. Promotion Workflows with Security Gates
+
+**Security integrated into release flow, not bolted on.** Promotion requests trigger gate evaluation before deployment.
+
+| Gate Type | What It Checks |
+|-----------|---------------|
+| **Security gate** | Reachable critical/high vulnerabilities |
+| **Approval gate** | Required approval count, separation of duties |
+| **Freeze window gate** | Environment freeze windows |
+| **Policy gate** | Custom OPA/Rego policies |
+| **Previous environment gate** | Release deployed to prior environment |
+
+**Decision records include:**
+- All gate results with pass/fail reasons
+- Evidence refs (scan verdicts, approval records)
+- Policy hash + inputs hash for replay
+- "Why blocked?" explainability
+
+**Modules (planned):** `PromotionManager`, `ApprovalGateway`, `DecisionEngine`
+
+### 3. Deployment Execution
+
+**Deploy to non-Kubernetes targets as first-class citizens.** Agent-based or agentless deployment to Docker hosts, Compose, ECS, Nomad.
+
+| Target Type | Deployment Method |
+|-------------|-------------------|
+| **Docker host** | Agent pulls and starts containers |
+| **Compose host** | Agent writes `compose.stella.lock.yml` and runs `docker-compose up` |
+| **ECS service** | Agent updates task definition and service |
+| **Nomad job** | Agent updates job spec and submits |
+| **SSH remote** | Agentless via SSH (Linux) |
+| **WinRM remote** | Agentless via WinRM (Windows) |
+
+**Generated artifacts:**
+- `compose.stella.lock.yml`: Pinned digests, resolved environment refs
+- `stella.version.json`: Version sticker on target for drift detection
+- `release.evidence.json`: Decision record
+
+**Modules (planned):** `DeployOrchestrator`, `Agent.*`, `ArtifactGenerator`
+
+### 4. Progressive Delivery
+
+**A/B releases and canary deployments.** Gradual rollout with automatic rollback on health failure.
+
+| Strategy | Description |
+|----------|-------------|
+| **Immediate** | 0% → 100% instantly |
+| **Canary** | 10% → 25% → 50% → 100% with health checks |
+| **Blue-green** | Deploy to B, switch traffic, retire A |
+| **Rolling** | 10% at a time with health checks |
+
+**Traffic routing plugins:** Nginx, HAProxy, Traefik, AWS ALB
+
+**Modules (planned):** `ABManager`, `TrafficRouter`, `CanaryController`
+
+### 5. Plugin System (Three-Surface Model)
+
+**Extensible without core code changes.** Plugins contribute through three surfaces.
+
+| Surface | What It Does |
+|---------|--------------|
+| **Manifest** | Declares what the plugin provides (integrations, steps, agents) |
+| **Connector runtime** | gRPC interface for runtime operations |
+| **Step provider** | Execution characteristics for workflow steps |
+
+**Plugin types:**
+- **Integration connectors:** SCM (GitHub, GitLab), CI (Actions, Jenkins), Registry (Harbor, ECR), Vault (HashiCorp, AWS Secrets)
+- **Step providers:** Custom workflow steps
+- **Agent types:** New deployment targets
+- **Gate providers:** Custom gate evaluations
+
+**Modules (planned):** `PluginRegistry`, `PluginLoader`, `PluginSandbox`, `PluginSDK`
+
+---
+
+## Security Capabilities (Operational)
+
+### 6. Decision Capsules — Audit-Grade Evidence Bundles
+
+**Every scan and release decision is sealed.** A Decision Capsule is a content-addressed bundle containing everything needed to reproduce and verify the decision.

 | Component | What's Included |
 |-----------|----------------|
-| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version, lattice rules |
+| **Inputs** | Exact SBOM, frozen feed snapshots (with Merkle roots), policy version |
 | **Evidence** | Reachability proofs (static + runtime), VEX statements, binary fingerprints |
 | **Outputs** | Verdicts, risk scores, remediation paths |
 | **Signatures** | DSSE envelopes over all of the above |

-**Why it matters:** Six months from now, an auditor can run `stella replay srm.yaml --assert-digest <sha>` and get *identical* results. This is what "audit-grade assurance" actually means.
+**Why it matters:** Auditors can replay any decision bit-for-bit. This is what "audit-grade assurance" actually means.

-**No competitor offers this.** Trivy, Grype, Snyk—none can replay a past scan bit-for-bit because they don't freeze feeds or produce deterministic manifests.
+**Modules:** `EvidenceLocker`, `Attestor`, `Replay`

-## 1. Delta SBOM Engine
+### 7. Lattice Policy + OpenVEX (K4 Logic)

-**Performance without sacrificing determinism.** Layer-aware ingestion keeps the SBOM catalog content-addressed; rescans only fetch new layers.
+**VEX as a logical claim system, not a suppression file.** The policy engine uses Belnap K4 four-valued logic.

- **Speed:** Warm scans < 1 second; CI/CD pipelines stay fast
- **Determinism:** Replay Manifest (SRM) captures exact analyzer inputs/outputs per layer
- **Evidence:** Binary crosswalk via Build-ID mapping; `bin:{sha256}` fallbacks for stripped binaries
+| State | Meaning |
+|-------|---------|
+| **Unknown (bottom)** | No information |
+| **True** | Positive assertion |
+| **False** | Negative assertion |
+| **Conflict (top)** | Contradictory assertions |

-**Modules:** `Scanner`, `SbomService`, `BinaryIndex`
+**Why it matters:** When vendor says "not_affected" but runtime shows the function was called, you have a *conflict*—not a false positive.

---
+**Modules:** `VexLens`, `TrustLatticeEngine`, `Policy`

-## 2. Lattice Policy + OpenVEX (K4 Logic)
+### 8. Signed Reachability Proofs

-**VEX as a logical claim system, not a suppression file.** The policy engine uses **Belnap K4 four-valued logic** (Unknown, True, False, Conflict) to merge SBOM, advisories, VEX, and waivers.
+**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE.

-| What Competitors Do | What Stella Does |
-|--------------------|------------------|
-| VEX filters findings (boolean) | VEX is logical claims with trust weighting |
-| Conflicts hidden | Conflicts are explicit state (⊤) |
-| "Vendor says not_affected" = done | Vendor + runtime + reachability merged; conflicts surfaced |
-| Unknown = assume safe | Unknown = first-class state with risk implications |
+| Layer | What It Proves |
+|-------|---------------|
+| **Static** | Call graph shows path from entrypoint → vulnerable function |
+| **Binary** | Compiled binary contains the symbol |
+| **Runtime** | Process actually executed the code path |

-**Why it matters:** When vendor says "not_affected" but your runtime shows the function was called, you have a *conflict*—not a false positive. The lattice preserves this for policy resolution.
+**Why it matters:** "Here's the exact call path" vs "potentially reachable." Signed, not claimed.

-**Modules:** `VexLens`, `TrustLatticeEngine`, `Excititor` (110+ tests passing)
+**Modules:** `ReachGraph`, `PathWitnessBuilder`

---
+### 9. Deterministic Replay

-## 3. Sovereign Crypto Profiles
-
-**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC (post-quantum) profiles are configuration toggles, not recompiles.
-
-| Profile | Algorithms | Use Case |
-|---------|-----------|----------|
-| **FIPS-140-3** | ECDSA P-256, RSA-PSS | US federal requirements |
-| **eIDAS** | ETSI TS 119 312 | EU qualified signatures |
-| **GOST-2012** | GOST R 34.10-2012 | Russian Federation |
-| **SM2** | GM/T 0003.2-2012 | People's Republic of China |
-| **PQC** | Dilithium, Falcon | Post-quantum readiness |
-
-**Why it matters:** Multi-signature DSSE envelopes (sign with FIPS *and* GOST) for cross-jurisdiction compliance. No competitor offers this.
-
-**Modules:** `Cryptography`, `CryptoProfile`, `RootPack`
-
---
-
-## 4. Deterministic Replay
-
-**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed with `stella replay srm.yaml`.
+**The audit-grade guarantee.** Every scan produces a DSSE + SRM bundle that can be replayed.

 ```bash
 # Six months later, prove what you knew
@@ -93,212 +188,62 @@ stella replay srm.yaml --assert-digest sha256:abc123...
 # Output: PASS - identical result
 ```

-**What's frozen:**
- Feed snapshots (NVD, KEV, EPSS, distro advisories) with content hashes
- Analyzer versions and configs
- Policy rules and lattice state
- Random seeds for deterministic ordering
+**What's frozen:** Feed snapshots, analyzer versions, policy rules, random seeds.

-**Why it matters:** This is what "audit-grade" actually means. Not "we logged it" but "you can re-run it."
+**Modules:** `Replay`, `Scanner`, `Policy`
+
+### 10. Sovereign Crypto Profiles
+
+**Regional compliance without code changes.** FIPS, eIDAS, GOST, SM, and PQC profiles are configuration toggles.
+
+| Profile | Use Case |
+|---------|----------|
+| **FIPS-140-3** | US federal |
+| **eIDAS** | EU qualified signatures |
+| **GOST-2012** | Russian Federation |
+| **SM2** | People's Republic of China |
+| **PQC** | Post-quantum readiness |
+
+**Modules:** `Cryptography`, `CryptoProfile`
+
+### 11. Offline Operations (Air-Gap Parity)
+
+**Full functionality without network.** Offline Update Kits bundle everything needed.
+
+| Component | Offline Method |
+|-----------|----------------|
+| Feed updates | Sealed bundle with Merkle roots |
+| Crypto verification | Embedded revocation lists |
+| Transparency logging | Local transparency mirror |
+
+**Modules:** `AirGap.Controller`, `TrustStore`

 ---

-## 5. Offline Operations (Air-Gap Parity)
+## Competitive Moats Summary

-**Full functionality without network.** Offline Update Kits bundle everything needed for air-gapped operation.
+**Six capabilities no competitor offers together:**

-| Component | Online | Offline |
-|-----------|--------|---------|
-| Feed updates | Live | Sealed bundle with Merkle roots |
-| Crypto verification | OCSP/CRL | Embedded revocation lists |
-| Transparency logging | Rekor | Local transparency mirror |
-| Trust roots | Live TSL | RootPack bundles |
+| # | Capability | Category |
+|---|-----------|----------|
+| 1 | **Non-Kubernetes Specialization** | Release orchestration |
+| 2 | **Digest-First Release Identity** | Release orchestration |
+| 3 | **Security Gates in Promotion Flow** | Release orchestration |
+| 4 | **Signed Reachability Proofs** | Security |
+| 5 | **Deterministic Replay** | Security |
+| 6 | **Sovereign + Offline Operation** | Operations |

-**Why it matters:** Air-gapped environments get *identical* results to connected, not degraded. Competitors offer partial offline (cached feeds) but not epistemic parity (sealed, reproducible knowledge state).
-
-**Modules:** `AirGap.Controller`, `TrustStore`, `EgressPolicy`
-
---
-
-## 6. Signed Reachability Proofs
-
-**Proof of exploitability, not just a badge.** Every reachability graph is sealed with DSSE; optional edge-bundle attestations for contested paths.
-
-| Layer | What It Proves | Attestation |
-|-------|---------------|-------------|
-| **Static** | Call graph says function is reachable | Graph-level DSSE |
-| **Binary** | Compiled binary contains the symbol | Build-ID mapping |
-| **Runtime** | Process actually executed the code path | Edge-bundle DSSE (optional) |
-
-**Why it matters:** Not "potentially reachable" but "here's the exact call path from `main()` to `vulnerable_function()`." You can quarantine or dispute individual edges, not just all-or-nothing.
-
-**No competitor signs reachability graphs.** They claim reachability; we *prove* it.
-
-**Modules:** `ReachGraph`, `PathWitnessBuilder`, `CompositeGateDetector`
-
---
-
-## 7. Semantic Smart-Diff
-
-**Diff security meaning, not CVE counts.** Compare reachability graphs, policy outcomes, and trust weights between releases.
-
-```
-Before: 5 critical CVEs (3 reachable)
-After:  7 critical CVEs (1 reachable)
-
-Smart-Diff output: "Exploitability DECREASED by 67% despite +2 CVEs"
-```
-
-**What's compared:**
- Reachability graph deltas
- VEX state changes
- Policy outcome changes
- Trust weight shifts
-
-**Why it matters:** "+3 CVEs" tells you nothing. "Reachable attack surface dropped by half" tells you everything.
-
-**Modules:** `MaterialRiskChangeDetector`, `RiskStateSnapshot`, `Scanner.ReachabilityDrift`
-
---
-
-## 8. Unknowns as First-Class State
-
-**Uncertainty is risk—we surface and score it.** Explicit modeling of what we *don't* know, with policy implications.
-
-| Band | Meaning | Policy Action |
-|------|---------|---------------|
-| **HOT** | High uncertainty + exploit pressure | Immediate investigation |
-| **WARM** | Moderate uncertainty | Scheduled review |
-| **COLD** | Low uncertainty | Decay toward resolution |
-| **RESOLVED** | Uncertainty eliminated | No action |
-
-**Why it matters:** Competitors hide unknowns (assume safe). We track them with decay algorithms, blast-radius containment, and policy budgets ("fail if unknowns > N").
-
-**Modules:** `UnknownStateLedger`, `Policy`, `Signals`
-
---
-
-## 9. Three-Layer Reachability Proofs
-
-**Structural false positive elimination.** All three layers must align for exploitability to be confirmed.
-
-```
-Layer 1 (Static):  Call graph shows path from entrypoint → vulnerable function
-Layer 2 (Binary):  Compiled binary contains the symbol with matching offset
-Layer 3 (Runtime): eBPF probe confirms function was actually executed
-```
-
-**Confidence tiers:**
- **Confirmed** — All three layers agree
- **Likely** — Static + binary agree; no runtime data
- **Present** — Package present; no reachability evidence
- **Unreachable** — Static analysis proves no path exists
-
-**Why it matters:** False positives become *structurally impossible*, not heuristically reduced.
-
-**Modules:** `Scanner.VulnSurfaces`, `PathWitnessBuilder`
-
---
-
-## 10. Competitive Moats Summary
-
-**Four capabilities no competitor offers together:**
-
-| # | Capability | Why It's Hard to Copy |
-|---|-----------|----------------------|
-| 1 | **Signed Reachability** | Requires three-layer instrumentation + cryptographic binding |
-| 2 | **Deterministic Replay** | Requires content-addressed evidence + feed snapshotting |
-| 3 | **K4 Lattice VEX** | Requires rethinking VEX from suppression to claims |
-| 4 | **Sovereign Offline** | Requires pluggable crypto + offline trust roots |
-
-**Reference:** `docs/product/competitive-landscape.md`, `docs/product/moat-strategy-summary.md`
-
---
-
-## 11. Trust Algebra Engine (K4 Lattice)
-
-**Formal conflict resolution, not naive precedence.** The lattice engine uses Belnap K4 four-valued logic to aggregate heterogeneous security assertions.
-
-| State | Meaning | Example |
-|-------|---------|---------|
-| **Unknown (⊥)** | No information | New package, no VEX yet |
-| **True (T)** | Positive assertion | "This CVE affects this package" |
-| **False (F)** | Negative assertion | "This CVE does not affect this package" |
-| **Conflict (⊤)** | Contradictory assertions | Vendor says not_affected; runtime says called |
-
-**Security Atoms (six orthogonal propositions):**
- PRESENT, APPLIES, REACHABLE, MITIGATED, FIXED, MISATTRIBUTED
-
-**Why it matters:** Unlike naive precedence (vendor > distro > scanner), we:
- Preserve conflicts as explicit state, not hidden
- Track critical unknowns separately from ancillary ones
- Produce deterministic, explainable dispositions
-
-**Modules:** `TrustLatticeEngine`, `Policy` (110+ tests passing)
-
---
-
-## 12. Deterministic Task Packs
-
-**Auditable automation.** TaskRunner executes declarative Task Packs with plan-hash binding, approvals, and DSSE evidence bundles.
-
- **Plan-hash binding:** Task pack execution is tied to specific plan versions
- **Approval gates:** Human sign-off required before execution
- **Sealed mode:** Air-gap compatible execution
- **Evidence bundles:** DSSE-signed results for audit trails
-
-**Why it matters:** Same workflows online or offline, with provable provenance.
-
-**Reference:** `docs/modules/packs-registry/guides/spec.md`, `docs/modules/taskrunner/architecture.md`
-
---
-
-## 13. Evidence-Grade Testing
-
-**Determinism as a continuous guarantee.** CI lanes that make reproducibility continuously provable.
-
-| Test Type | What It Proves |
-|----------|---------------|
-| **Determinism tests** | Same inputs → same outputs |
-| **Offline parity tests** | Air-gapped = connected results |
-| **Contract stability tests** | APIs don't break |
-| **Golden fixture tests** | Historical scans still replay |
-
-**Why it matters:** Regression-proof audits. Evidence, not assumptions, drives releases.
-
-**Reference:** `docs/technical/testing/testing-strategy-models.md`, `docs/TEST_SUITE_OVERVIEW.md`
+**Pricing moat:** No per-seat, per-project, or per-deployment tax. Limits are environments + new digests/day.

 ---

 ## Quick Reference

-### Key Commands
-
-```bash
-# Determinism proof
-stella scan --image <img> --srm-out a.yaml
-stella scan --image <img> --srm-out b.yaml
-diff a.yaml b.yaml  # Identical
-
-# Replay proof
-stella replay srm.yaml --assert-digest <sha>
-
-# Reachability proof
-stella graph show --cve CVE-XXXX-YYYY --artifact <digest>
-
-# VEX evaluation
-stella vex evaluate --artifact <digest>
-
-# Offline scan
-stella rootpack import bundle.tar.gz
-stella scan --offline --image <digest>
-```
-
 ### Key Documents

- **Competitive Landscape**: `docs/product/competitive-landscape.md`
- **Moat Strategy**: `docs/product/moat-strategy-summary.md`
- **Proof Architecture**: `docs/modules/platform/proof-driven-moats-architecture.md`
- **Vision**: `docs/VISION.md`
- **Architecture Overview**: `docs/ARCHITECTURE_OVERVIEW.md`
- **Quickstart**: `docs/quickstart.md`
+- **Product Vision**: [`docs/product/VISION.md`](product/VISION.md)
+- **Architecture Overview**: [`docs/ARCHITECTURE_OVERVIEW.md`](ARCHITECTURE_OVERVIEW.md)
+- **Release Orchestrator Architecture**: [`docs/modules/release-orchestrator/architecture.md`](modules/release-orchestrator/architecture.md)
+- **Competitive Landscape**: [`docs/product/competitive-landscape.md`](product/competitive-landscape.md)
+- **Quickstart**: [`docs/quickstart.md`](quickstart.md)
+- **Feature Matrix**: [`docs/FEATURE_MATRIX.md`](FEATURE_MATRIX.md)