save progress

This commit is contained in:
master
2026-01-09 18:27:36 +02:00
parent e608752924
commit a21d3dbc1f
361 changed files with 63068 additions and 1192 deletions

View File

@@ -1,545 +0,0 @@
Below is a cohesive set of **7 product advisories** that together define an “AI-native” Stella Ops with defensible moats. Each advisory follows the same structure:
* **Problem** (what hurts today)
* **Why** (why Stella should solve it)
* **What we ship** (capabilities, boundaries)
* **How we achieve** (proposed `AdvisoryAI` backend modules + key UI components)
* **Guardrails** (safety / trust / determinism)
* **KPIs** (how you prove it works)
Im assuming your canonical object model already includes **Runs** (incident/escalation/change investigation runs) and a system-of-record in **PostgreSQL** with **Valkey** as a non-authoritative accelerator.
---
# ADVISORY-AI-000 — AdvisoryAI Foundation: Chat + Workbench + Runs (the “AI OS surface”)
## Problem
Most “AI in ops” fails because its only a chat box. Chat is not:
* auditable
* repeatable
* actionable with guardrails
* collaborative (handoffs, approvals, artifacts)
Operators need a place where AI output becomes **objects** (runs, decisions, patches, evidence packs), not ephemeral text.
## Why we do it
This advisory is the substrate for all other moats. Without it, your other features remain demos.
## What we ship
1. **AdvisoryAI Orchestrator** that can:
* read Stella objects (runs, services, policies, evidence)
* propose plans
* call tools/actions (within policy)
* produce structured artifacts (patches, decision records, evidence packs)
2. **AI Workbench UI**:
* Chat panel for intent
* Artifact cards (Run, Playbook Patch, Decision, Evidence Pack)
* Run Timeline view (what happened, tool calls, approvals, outputs)
## How we achieve (modules + UI)
### Backend modules (suggested)
* `StellaOps.AdvisoryAI.WebService`
* Conversation/session orchestration
* Tool routing + action execution requests
* Artifact creation (Run notes, patches, decisions)
* `StellaOps.AdvisoryAI.Prompting`
* Prompt templates versioned + hashed
* Guarded system prompts per “mode”
* `StellaOps.AdvisoryAI.Tools`
* Tool contracts (read-only queries, action requests)
* `StellaOps.AdvisoryAI.Eval`
* Regression tests for tool correctness + safety
### UI components
* `AiChatPanelComponent`
* `AiArtifactCardComponent` (Run/Decision/Patch/Evidence Pack)
* `RunTimelineComponent` (with “AI steps” and “human steps”)
* `ModeSelectorComponent` (Analyst / Operator / Autopilot)
### Canonical flow
```
User intent (chat)
-> AdvisoryAI proposes plan (steps)
-> executes read-only tools
-> generates artifact(s)
-> requests approvals for risky actions
-> records everything on Run timeline
```
## Guardrails
* Every AI interaction writes to a **Run** (or attaches to an existing Run).
* Prompt templates are **versioned + hashed**.
* Tool calls and outputs are **persisted** (for audit and replay).
## KPIs
* % AI sessions attached to Runs
* “Time to first useful artifact”
* Operator adoption (weekly active users of Workbench)
---
# ADVISORY-AI-001 — Evidence-First Outputs (trust-by-construction)
## Problem
In ops, an answer without evidence is a liability. LLMs are persuasive even when wrong. Operators waste time verifying or, worse, act on incorrect claims.
## Why we do it
Evidence-first output is the trust prerequisite for:
* automation
* playbook learning
* org memory
* executive reporting
## What we ship
* A **Claim → Evidence** constraint:
* Each material claim must be backed by an `EvidenceRef` (query snapshot, ticket, pipeline run, commit, config state).
* An **Evidence Pack** artifact:
* A shareable bundle of evidence for an incident/change/review.
## How we achieve (modules + UI)
### Backend modules
* `StellaOps.AdvisoryAI.Evidence`
* Claim extraction from model output
* Evidence retrieval + snapshotting
* Citation enforcement (or downgrade claim confidence)
* `StellaOps.EvidenceStore`
* Immutable (or content-addressed) snapshots
* Hashes, timestamps, query parameters
### UI components
* `EvidenceSidePanelComponent` (opens from inline citations)
* `EvidencePackViewerComponent`
* `ConfidenceBadgeComponent` (Verified / Inferred / Unknown)
### Implementation pattern
* For each answer:
1. Draft response
2. Extract claims
3. Attach evidence refs
4. If evidence missing: label as uncertain + propose verification steps
## Guardrails
* If evidence is missing, Stella must **not** assert certainty.
* Evidence snapshots must capture:
* query inputs
* time range
* raw result (or hash + storage pointer)
## KPIs
* Citation coverage (% of answers with evidence refs)
* Reduced back-and-forth (“how do you know?” rate)
* Adoption of automation after evidence-first rollout
---
# ADVISORY-AI-002 — Policy-Aware Automation (safe actions, not just suggestions)
## Problem
The main blocker to “AI that acts” is governance:
* wrong environment
* insufficient permission
* missing approvals
* non-idempotent actions
* unclear accountability
## Why we do it
If Stella cant safely execute actions, it will remain a read-only assistant. Policy-aware automation is a hard moat because it requires real engineering discipline and operational maturity.
## What we ship
* A typed **Action Registry**:
* schemas, risk levels, idempotency, rollback/compensation
* A **Policy decision point** (PDP) before any action:
* allow / allow-with-approvals / deny
* An **Approval workflow** linked to Runs
## How we achieve (modules + UI)
### Backend modules
* `StellaOps.ActionRegistry`
* Action definitions + schemas + risk metadata
* `StellaOps.PolicyEngine`
* Rules: environment protections, freeze windows, role constraints
* `StellaOps.AdvisoryAI.Automation`
* Converts intent → action proposals
* Submits action requests after approvals
* `StellaOps.RunLedger`
* Every action request + result is a ledger entry
### UI components
* `ActionProposalCardComponent`
* `ApprovalModalComponent` (scoped approval: this action/this run/this window)
* `PolicyExplanationComponent` (human-readable “why allowed/denied”)
* `RollbackPanelComponent`
## Guardrails
* Default: propose actions; only auto-execute in explicitly configured “Autopilot scopes.”
* Every action must support:
* idempotency key
* audit fields (why, ticket/run linkage)
* reversible/compensating action where feasible
## KPIs
* % actions proposed vs executed
* “Policy prevented incident” count
* Approval latency and action success rate
---
# ADVISORY-AI-003 — Ops Memory (structured, durable, queryable)
## Problem
Teams repeat incidents because knowledge lives in:
* chat logs
* tribal memory
* scattered tickets
* unwritten heuristics
Chat history is not an operational knowledge base: its unstructured and hard to reuse safely.
## Why we do it
Ops memory reduces repeat work and accelerates diagnosis. It also becomes a defensible dataset because its tied to your Runs, artifacts, and outcomes.
## What we ship
A set of typed memory objects (not messages):
* `DecisionRecord`
* `KnownIssue`
* `Tactic`
* `Constraint`
* `PostmortemSummary`
Memory is written on:
* Run closure
* approvals (policy events)
* explicit “save as org memory” actions
## How we achieve (modules + UI)
### Backend modules
* `StellaOps.AdvisoryAI.Memory`
* Write: extract structured memory from run artifacts
* Read: retrieve memory relevant to current context (service/env/symptoms)
* Conflict handling: “superseded by”, timestamps, confidence
* `StellaOps.MemoryStore` (Postgres tables + full-text index as needed)
### UI components
* `MemoryPanelComponent` (contextual suggestions during a run)
* `MemoryBrowserComponent` (search + filters)
* `MemoryDiffComponent` (when superseding prior memory)
## Guardrails
* Memory entries have:
* scope (service/env/team)
* confidence (verified vs anecdotal)
* review/expiry policies for tactics/constraints
* Never “learn” from unresolved or low-confidence runs by default.
## KPIs
* Repeat incident rate reduction
* Time-to-diagnosis delta when memory exists
* Memory reuse rate inside Runs
---
# ADVISORY-AI-004 — Playbook Learning (Run → Patch → Approved Playbook)
## Problem
Runbooks/playbooks drift. Operators improvise. The playbook never improves, and the organization pays the same “tuition” repeatedly.
## Why we do it
Playbook learning is the compounding loop that turns daily operations into a proprietary advantage. Competitors can generate playbooks; they struggle to continuously improve them from real run traces with review + governance.
## What we ship
* Versioned playbooks as structured objects
* **Playbook Patch** proposals generated from Run traces:
* coverage patches, repair patches, optimization patches, safety patches, detection patches
* Owner review + approval workflow
## How we achieve (modules + UI)
### Backend modules
* `StellaOps.Playbooks`
* Playbook schema + versioning
* `StellaOps.AdvisoryAI.PlaybookLearning`
* Extract “what we did” from Run timeline
* Compare to playbook steps
* Propose a patch with evidence links
* `StellaOps.DiffService`
* Human-friendly diff output for UI
### UI components
* `PlaybookPatchCardComponent`
* `DiffViewerComponent` (Monaco diff or equivalent)
* `PlaybookApprovalFlowComponent`
* `PlaybookCoverageHeatmapComponent` (optional, later)
## Guardrails
* Never auto-edit canonical playbooks; only patches + review.
* Require evidence links for each proposed step.
* Prevent one-off contamination by marking patches as:
* “generalizable” vs “context-specific”
## KPIs
* % incidents with a playbook
* Patch acceptance rate
* MTTR improvement for playbook-backed incidents
---
# ADVISORY-AI-005 — Integration Concierge (setup + health + “how-to” that is actually correct)
## Problem
Integrations are where tools die:
* users ask “how do I integrate X”
* assistant answers generically
* setup fails because of environment constraints, permissions, webhooks, scopes, retries, or missing prerequisites
* no one can debug it later
## Why we do it
Integration handling becomes a moat when it is:
* deterministic (wizard truth)
* auditable (events + actions traced)
* self-healing (retries, backfills, health checks)
* explainable (precise steps, not generic docs)
## What we ship
1. **Integration Setup Wizard** per provider (GitLab, Jira, Slack, etc.)
2. **Integration Health** dashboard:
* last event received
* last action executed
* failure reasons + next steps
* token expiry warnings
3. **Chat-driven guidance** that drives the same wizard backend:
* when user asks “how to integrate GitLab,” Stella replies with the exact steps for the instance type, auth mode, and required permissions, and can pre-fill a setup plan.
## How we achieve (modules + UI)
### Backend modules
* `StellaOps.Integrations`
* Provider contracts: inbound events + outbound actions
* Normalization into Stella `Signals` and `Actions`
* `StellaOps.Integrations.Reliability`
* Webhook dedupe, replay, dead-letter, backfill polling
* `StellaOps.AdvisoryAI.Integrations`
* Retrieves provider-specific setup templates
* Asks only for missing parameters
* Produces a “setup checklist” artifact attached to a Run or Integration record
### UI components
* `IntegrationWizardComponent`
* `IntegrationHealthComponent`
* `IntegrationEventLogComponent` (raw payload headers + body stored securely)
* `SetupChecklistArtifactComponent` (generated by AdvisoryAI)
## Guardrails
* Store inbound webhook payloads for replay/debug, with redaction where required.
* Always support reconciliation/backfill (webhooks are never perfectly lossless).
* Use least-privilege token scopes by default, with clear permission error guidance.
## KPIs
* Time-to-first-successful-event
* Integration “healthy” uptime
* Setup completion rate without human support
---
# ADVISORY-AI-006 — Outcome Analytics (prove ROI with credible attribution)
## Problem
AI features are easy to cut in budgeting because value is vague. “It feels faster” doesnt survive scrutiny.
## Why we do it
Outcome analytics makes Stella defensible to leadership and helps prioritize what to automate next. It also becomes a dataset for continuous improvement.
## What we ship
* Baseline metrics (before Stella influence):
* MTTA, MTTR, escalation count, repeat incidents, deploy failure rate (as relevant)
* Attribution model (only count impact when Stella materially contributed):
* playbook patch accepted
* evidence pack used
* policy-gated action executed
* memory entry reused
* Monthly/weekly impact reports
## How we achieve (modules + UI)
### Backend modules
* `StellaOps.Analytics`
* Metric computation + cohorts (by service/team/severity)
* `StellaOps.AdvisoryAI.Attribution`
* Joins outcomes to AI artifacts and actions in the Run ledger
* `StellaOps.Reporting`
* Scheduled report generation (exportable)
### UI components
* `OutcomeDashboardComponent`
* `AttributionBreakdownComponent`
* `ExecutiveReportExportComponent`
## Guardrails
* Avoid vanity metrics (“number of chats”).
* Always show confidence/limitations in attribution (correlation vs causation).
## KPIs
* MTTR delta (with Stella artifacts vs without)
* Repeat incident reduction
* Escalation reduction
* Automation coverage growth
---
## One unifying implementation note: “AdvisoryAI” should output objects, not prose
To make all seven advisories work together, standardize on a small set of **AI-produced artifacts**:
* `Plan` (step list with tools/actions)
* `EvidencePack`
* `DecisionRecord`
* `PlaybookPatch`
* `IntegrationSetupChecklist`
* `RunSummary` (postmortem-ready)
Every artifact is:
* versioned
* evidence-linked
* attached to a Run
* subject to policy gates when it triggers actions
This gives you:
* auditability
* deterministic replay of the *inputs and tool outputs*
* compounding “Ops memory” and “Playbook learning” data
---
## Example: how this maps to “AdvisoryAI module + Chat UI component”
Minimum viable architecture that is coherent:
**Backend**
* `StellaOps.AdvisoryAI.WebService` (orchestrator)
* `StellaOps.AdvisoryAI.Evidence` (citations)
* `StellaOps.AdvisoryAI.Memory` (structured memory)
* `StellaOps.AdvisoryAI.PlaybookLearning` (patch proposals)
* `StellaOps.AdvisoryAI.Integrations` (setup guidance + checklists)
* `StellaOps.PolicyEngine` + `StellaOps.ActionRegistry` (gated actions)
* `StellaOps.RunLedger` + `StellaOps.EvidenceStore` (audit + snapshots)
**UI**
* AI chat panel that emits/accepts **artifact cards**
* Evidence side panel
* Approval modal
* Diff viewer for patches
* Integration wizard + health
---
If you want, I can translate these advisories into a single internal spec pack:
* database tables (Postgres)
* event contracts (signals/actions)
* JSON schemas for artifacts/actions
* UI navigation and component tree
* the first 10 “golden workflows” you should ship with the Workbench

View File

@@ -0,0 +1,114 @@
## Stella Ops Suite card
### What Stella Ops Suite is
**Stella Ops Suite is a centralized, auditable release control plane for nonKubernetes container estates.**
It sits between your CI and your runtime targets, governs **promotion across environments**, enforces **security + policy gates**, and produces **verifiable evidence** for every release decision—while remaining **plugin friendly** to any SCM/CI/registry/secrets stack.
### What it does
* **Release orchestration (nonK8s):** UI-driven promotion (Dev → Stage → Prod), approvals, policy gates, rollbacks; steps are **hookable with scripts** (and/or step providers).
* **Security decisioning as a gate:** scan on **build**, evaluate on **release**, and **reevaluate** on vulnerability intelligence updates without forcing re-scans of the same artifact.
* **OCI-digest first:** treats a release as an immutable **digest** (or bundle of digests) and tracks “what is deployed where” with integrity.
* **Toolchainagnostic integrations:** plug into any **SCM / repo**, any **CI**, any **registry**, and any **secrets** system; customers can reuse what they already run.
* **Auditability + standards:** audit log + evidence packets (exportable), SBOM/VEX/attestation-friendly, standards-first approach.
### Core strengths
* **NonKubernetes specialization:** Docker hosts/Compose/ECS/Nomad-style targets are first-class, not an afterthought.
* **Reproducibility:** deterministic release decisions captured as evidence (inputs + policy hash + verdict + approvals).
* **Attestability:** produces and verifies release evidence/attestations (provenance, SBOM linkage, decision records) in standard formats.
* **Verity (integrity):** digest-based release identity; signature/provenance verification; tamper-evident audit trail.
* **Hybrid reachability:** reachability-aware vulnerability prioritization (static + “hybrid” signals) to reduce noise and focus on exploitable paths.
* **Cost that doesnt punish automation:** no per-project tax, no per-seat tax, no “deployments bill.” Limits are **only**:
**(1) number of environments** and **(2) number of new digests analyzed per day.**
---
# Why Stella wins vs competitors (in one line each)
* **CI/CD tools** (Actions/Jenkins/GitLab CI): great at *running pipelines*, weak at being a **central release authority across environments + registries + targets** with audit-grade evidence and security decisioning as a gate.
* **Deployment tools / CD orchestrators** (Octopus/Harness/Spinnaker/CloudBees): strong promotions, but security depth (reachability, attestations, continuous re-evaluation) is often **bolton**, and pricing often scales poorly (projects/services/users).
* **Docker registries / artifact platforms** (Harbor/JFrog/Docker registry ecosystems): can store + scan images, but dont provide a **release governance control plane** (promotion workflows, approvals, policy reasoning, deploy execution across targets).
* **Vulnerability scanners / CNAPP** (Trivy/Snyk/Aqua/Anchore/etc.): can scan well, but do not provide **release orchestration + promotion governance + deploy execution** with a single evidence ledger.
---
# Feature table: Stella vs “typical” alternatives (detailed)
**Legend:**
* **Native** = built-in, first-class
* **Partial** = exists but not release-centric / limited scope
* **Via integration** = possible but not owned end-to-end
* **N/A** = not a focus of that tool category
* **Varies** = depends heavily on vendor/edition/plugins
| Feature area | Stella Ops Suite (Release + Security Control Plane) | CI/CD tools (Actions/Jenkins/GitLab CI) | CD/Deploy orchestrators (Octopus/Harness/Spinnaker) | Registries / artifact platforms (Harbor/JFrog/Docker) | Scanners / CNAPP (Trivy/Snyk/Aqua/Anchore/etc.) |
| ------------------------------------------------------------------- | -------------------------------------------------------------------------------- | --------------------------------------- | -------------------------------------------------------- | ----------------------------------------------------- | ----------------------------------------------- |
| **Primary abstraction** | **Release by OCI digest** + environment promotion | Pipeline run / job | Release / deployment pipeline | Artifact/image repo | Scan report / project |
| **NonK8s container focus** | **Native** (Docker/ECS/Nomad style) | Partial (scripts can deploy anywhere) | Partial (often broad incl. K8s) | Native for registries; not deploy | N/A |
| **Environment model** (Dev/Stage/Prod) | **Native** (envs are first-class) | Partial (vendor-dependent env tracking) | **Native** | Partial (some repos have “projects,” not env) | N/A |
| **Promotion workflow** (Dev→Prod) | **Native** | Via integration / custom pipeline | **Native** | N/A | N/A |
| **Approvals / manual gates** | **Native** | Partial (manual steps exist) | **Native** | N/A | N/A |
| **Separation of duties** (policy) | **Native** (policy-driven) | Partial / varies | Partial / varies | N/A | N/A |
| **Freeze windows / release windows** | Native (policy-driven) | Varies | Varies | N/A | N/A |
| **Deployment execution** to targets | **Native** (agents + target adapters) | Via scripts | **Native** | N/A | N/A |
| **Rollback / redeploy same digest** | **Native** | Via scripts | **Native** | N/A | N/A |
| **Target inventory** (hosts/services) | **Native** | N/A | Partial (depends) | N/A | N/A |
| **Scriptable step hooks** | **Native** (hooks everywhere) | Native (pipelines are scripts) | **Native/Partial** (often supported) | N/A | Partial (hooks in CI) |
| **Pluggable connectors** (SCM/CI/registry) | **Native design goal** (reuse customer stack) | N/A (they *are* the CI) | Partial | Partial | Partial |
| **Registry-neutral operation** | **Native** (works with any registry; can reuse) | Via scripts | Via integration | Registry-centric | N/A |
| **Release gating based on security** | **Native** (scanner verdict is a gate) | Via integration | Via integration | Partial (policy usually at pull time) | N/A (scanner doesnt deploy) |
| **Scan timing: build-time** | **Native** (CI integration) | Via integration | Via integration | Partial | **Native** |
| **Scan timing: release-time** | **Native** (gate uses cached evidence) | Via integration | Via integration | Partial | Partial |
| **Scan timing: CVE update re-evaluation** | **Native** (continuous re-eval) | Rare / custom | Rare / custom | Partial (platform dependent) | Varies (often supported) |
| **New-digest accounting** (dont charge for redeploys) | **Native (digest-cache first)** | N/A | N/A | N/A | Varies |
| **SBOM generation** | **Native** | Via integration | Via integration | Partial | **Native/Partial** |
| **VEX support** (clarify not-affected/fixed) | **Native** (standards-first) | Via integration | Via integration | Partial | Varies |
| **Reachability analysis** | **Native** (incl. hybrid reachability) | Via integration | Via integration | Rare | Varies (often not reachability) |
| **Hybrid reachability** (static + optional runtime signals) | **Native** | N/A | N/A | N/A | Rare |
| **Exploit intelligence / prioritization** (KEV-like, etc.) | Native / planned (as decision inputs) | Via integration | Via integration | Partial | Varies |
| **Backport / fix verification** | Native / planned (noise reduction) | N/A | N/A | N/A | Rare |
| **Attestability** (produce attestations/evidence) | **Native** (evidence packet export) | Partial | Partial | Partial | Partial |
| **Verity** (signature/provenance verification) | **Native** (enforce verifiable releases) | Via integration | Via integration | Partial (registry dependent) | Partial |
| **Reproducibility** (replayable decision/evidence) | **Native** (policy+inputs hashed) | Rare | Rare | N/A | N/A |
| **Central audit ledger** (who/what/why) | **Native** | Partial (logs exist, not unified) | Partial (deployment logs) | Partial (artifact logs) | Partial (scan logs) |
| **“Why blocked?” explainability** | **Native** (decision reasons + evidence refs) | Varies | Varies | Varies | Varies |
| **Multi-toolchain governance** (one control plane over many stacks) | **Native** | No (each CI silo) | Partial | No (registry silo) | No (scanner silo) |
| **Open-source extensibility** | **Native** (OSS agents/connectors, paid core) | Native OSS for some (Jenkins) | Varies | Varies | Varies |
| **Pricing pain point** | **No per-seat / per-project / per-deploy tax** | Often per-seat or usage | Often per-project/service/user | Often storage/traffic/consumption | Often per-seat / consumption |
| **Best fit** | NonK8s container teams needing centralized, auditable releases + security gates | Teams wanting pipeline automation | Teams wanting deployment automation (security bolted on) | Teams needing artifact storage + basic scanning | Teams needing scanning, not orchestration |
**Interpretation:** Stella is not trying to “replace CI” or “be a registry.” It is the **release integrity layer** that (a) makes promotion decisions, (b) executes deployments to nonK8s container targets, and (c) produces verifiable evidence for audit and reproducibility—while reusing the customers existing SCM/CI/registry.
---
# Stella pricing proposal (all features included; only scale limits)
**Pricing principle:**
You pay for **(1) environments** and **(2) new artifact digests analyzed per day**.
Deployments/promotions are unlimited (fair use), and **re-evaluation on CVE updates is included** and does not consume “new digest analyses.”
| Plan | Price | Environments | New digests analyzed/day | Whats included |
| ----------------------------------------------- | -----------------: | -----------: | -----------------------: | ------------------------------------------------------------------------------------- |
| **Free + Registration** (monthly token renewal) | $0 | 3 | 333 | Full suite features, unlimited deployments (fair use), evidence + audit, integrations |
| **Pro** | **$699 / month** | 33 | 3333 | Same features |
| **Enterprise** | **$1,999 / month** | Unlimited | Unlimited | Same features, “no hard limits,” fair use on mirroring/audit-confirmation bandwidth |
### “Fair use” (make it explicit so its credible)
* Unlimited deployments/promotions assume normal operational usage (no abusive tight-loop triggers).
* “Unlimited” in Enterprise is protected by fair use for:
* vulnerability feed mirroring bandwidth and update frequency
* audit confirmation / evidence export traffic spikes
* storage growth beyond reasonable bounds (offer storage retention controls)
---
# Short “elevator pitch” for the card (copy-ready)
**Stella Ops Suite** gives nonKubernetes container teams a **central release authority**: it orchestrates environment promotions, gates releases using **reachability-aware security** and policy, and produces **verifiable, auditable evidence** for every decision—without charging per project, per seat, or per deployment.
If you want, I can compress this into a true one-page “sales card” layout (same content, but formatted exactly like a procurement-ready PDF/one-pager), and a second version tailored to your best ICP (Docker host fleets vs ECS-heavy teams).