Below is a cohesive set of **7 product advisories** that together define an “AI-native” Stella Ops with defensible moats. Each advisory follows the same structure: * **Problem** (what hurts today) * **Why** (why Stella should solve it) * **What we ship** (capabilities, boundaries) * **How we achieve** (proposed `AdvisoryAI` backend modules + key UI components) * **Guardrails** (safety / trust / determinism) * **KPIs** (how you prove it works) I’m assuming your canonical object model already includes **Runs** (incident/escalation/change investigation runs) and a system-of-record in **PostgreSQL** with **Valkey** as a non-authoritative accelerator. --- # ADVISORY-AI-000 — AdvisoryAI Foundation: Chat + Workbench + Runs (the “AI OS surface”) ## Problem Most “AI in ops” fails because it’s only a chat box. Chat is not: * auditable * repeatable * actionable with guardrails * collaborative (handoffs, approvals, artifacts) Operators need a place where AI output becomes **objects** (runs, decisions, patches, evidence packs), not ephemeral text. ## Why we do it This advisory is the substrate for all other moats. Without it, your other features remain demos. ## What we ship 1. **AdvisoryAI Orchestrator** that can: * read Stella objects (runs, services, policies, evidence) * propose plans * call tools/actions (within policy) * produce structured artifacts (patches, decision records, evidence packs) 2. **AI Workbench UI**: * Chat panel for intent * Artifact cards (Run, Playbook Patch, Decision, Evidence Pack) * Run Timeline view (what happened, tool calls, approvals, outputs) ## How we achieve (modules + UI) ### Backend modules (suggested) * `StellaOps.AdvisoryAI.WebService` * Conversation/session orchestration * Tool routing + action execution requests * Artifact creation (Run notes, patches, decisions) * `StellaOps.AdvisoryAI.Prompting` * Prompt templates versioned + hashed * Guarded system prompts per “mode” * `StellaOps.AdvisoryAI.Tools` * Tool contracts (read-only queries, action requests) * `StellaOps.AdvisoryAI.Eval` * Regression tests for tool correctness + safety ### UI components * `AiChatPanelComponent` * `AiArtifactCardComponent` (Run/Decision/Patch/Evidence Pack) * `RunTimelineComponent` (with “AI steps” and “human steps”) * `ModeSelectorComponent` (Analyst / Operator / Autopilot) ### Canonical flow ``` User intent (chat) -> AdvisoryAI proposes plan (steps) -> executes read-only tools -> generates artifact(s) -> requests approvals for risky actions -> records everything on Run timeline ``` ## Guardrails * Every AI interaction writes to a **Run** (or attaches to an existing Run). * Prompt templates are **versioned + hashed**. * Tool calls and outputs are **persisted** (for audit and replay). ## KPIs * % AI sessions attached to Runs * “Time to first useful artifact” * Operator adoption (weekly active users of Workbench) --- # ADVISORY-AI-001 — Evidence-First Outputs (trust-by-construction) ## Problem In ops, an answer without evidence is a liability. LLMs are persuasive even when wrong. Operators waste time verifying or, worse, act on incorrect claims. ## Why we do it Evidence-first output is the trust prerequisite for: * automation * playbook learning * org memory * executive reporting ## What we ship * A **Claim → Evidence** constraint: * Each material claim must be backed by an `EvidenceRef` (query snapshot, ticket, pipeline run, commit, config state). * An **Evidence Pack** artifact: * A shareable bundle of evidence for an incident/change/review. ## How we achieve (modules + UI) ### Backend modules * `StellaOps.AdvisoryAI.Evidence` * Claim extraction from model output * Evidence retrieval + snapshotting * Citation enforcement (or downgrade claim confidence) * `StellaOps.EvidenceStore` * Immutable (or content-addressed) snapshots * Hashes, timestamps, query parameters ### UI components * `EvidenceSidePanelComponent` (opens from inline citations) * `EvidencePackViewerComponent` * `ConfidenceBadgeComponent` (Verified / Inferred / Unknown) ### Implementation pattern * For each answer: 1. Draft response 2. Extract claims 3. Attach evidence refs 4. If evidence missing: label as uncertain + propose verification steps ## Guardrails * If evidence is missing, Stella must **not** assert certainty. * Evidence snapshots must capture: * query inputs * time range * raw result (or hash + storage pointer) ## KPIs * Citation coverage (% of answers with evidence refs) * Reduced back-and-forth (“how do you know?” rate) * Adoption of automation after evidence-first rollout --- # ADVISORY-AI-002 — Policy-Aware Automation (safe actions, not just suggestions) ## Problem The main blocker to “AI that acts” is governance: * wrong environment * insufficient permission * missing approvals * non-idempotent actions * unclear accountability ## Why we do it If Stella can’t safely execute actions, it will remain a read-only assistant. Policy-aware automation is a hard moat because it requires real engineering discipline and operational maturity. ## What we ship * A typed **Action Registry**: * schemas, risk levels, idempotency, rollback/compensation * A **Policy decision point** (PDP) before any action: * allow / allow-with-approvals / deny * An **Approval workflow** linked to Runs ## How we achieve (modules + UI) ### Backend modules * `StellaOps.ActionRegistry` * Action definitions + schemas + risk metadata * `StellaOps.PolicyEngine` * Rules: environment protections, freeze windows, role constraints * `StellaOps.AdvisoryAI.Automation` * Converts intent → action proposals * Submits action requests after approvals * `StellaOps.RunLedger` * Every action request + result is a ledger entry ### UI components * `ActionProposalCardComponent` * `ApprovalModalComponent` (scoped approval: this action/this run/this window) * `PolicyExplanationComponent` (human-readable “why allowed/denied”) * `RollbackPanelComponent` ## Guardrails * Default: propose actions; only auto-execute in explicitly configured “Autopilot scopes.” * Every action must support: * idempotency key * audit fields (why, ticket/run linkage) * reversible/compensating action where feasible ## KPIs * % actions proposed vs executed * “Policy prevented incident” count * Approval latency and action success rate --- # ADVISORY-AI-003 — Ops Memory (structured, durable, queryable) ## Problem Teams repeat incidents because knowledge lives in: * chat logs * tribal memory * scattered tickets * unwritten heuristics Chat history is not an operational knowledge base: it’s unstructured and hard to reuse safely. ## Why we do it Ops memory reduces repeat work and accelerates diagnosis. It also becomes a defensible dataset because it’s tied to your Runs, artifacts, and outcomes. ## What we ship A set of typed memory objects (not messages): * `DecisionRecord` * `KnownIssue` * `Tactic` * `Constraint` * `PostmortemSummary` Memory is written on: * Run closure * approvals (policy events) * explicit “save as org memory” actions ## How we achieve (modules + UI) ### Backend modules * `StellaOps.AdvisoryAI.Memory` * Write: extract structured memory from run artifacts * Read: retrieve memory relevant to current context (service/env/symptoms) * Conflict handling: “superseded by”, timestamps, confidence * `StellaOps.MemoryStore` (Postgres tables + full-text index as needed) ### UI components * `MemoryPanelComponent` (contextual suggestions during a run) * `MemoryBrowserComponent` (search + filters) * `MemoryDiffComponent` (when superseding prior memory) ## Guardrails * Memory entries have: * scope (service/env/team) * confidence (verified vs anecdotal) * review/expiry policies for tactics/constraints * Never “learn” from unresolved or low-confidence runs by default. ## KPIs * Repeat incident rate reduction * Time-to-diagnosis delta when memory exists * Memory reuse rate inside Runs --- # ADVISORY-AI-004 — Playbook Learning (Run → Patch → Approved Playbook) ## Problem Runbooks/playbooks drift. Operators improvise. The playbook never improves, and the organization pays the same “tuition” repeatedly. ## Why we do it Playbook learning is the compounding loop that turns daily operations into a proprietary advantage. Competitors can generate playbooks; they struggle to continuously improve them from real run traces with review + governance. ## What we ship * Versioned playbooks as structured objects * **Playbook Patch** proposals generated from Run traces: * coverage patches, repair patches, optimization patches, safety patches, detection patches * Owner review + approval workflow ## How we achieve (modules + UI) ### Backend modules * `StellaOps.Playbooks` * Playbook schema + versioning * `StellaOps.AdvisoryAI.PlaybookLearning` * Extract “what we did” from Run timeline * Compare to playbook steps * Propose a patch with evidence links * `StellaOps.DiffService` * Human-friendly diff output for UI ### UI components * `PlaybookPatchCardComponent` * `DiffViewerComponent` (Monaco diff or equivalent) * `PlaybookApprovalFlowComponent` * `PlaybookCoverageHeatmapComponent` (optional, later) ## Guardrails * Never auto-edit canonical playbooks; only patches + review. * Require evidence links for each proposed step. * Prevent one-off contamination by marking patches as: * “generalizable” vs “context-specific” ## KPIs * % incidents with a playbook * Patch acceptance rate * MTTR improvement for playbook-backed incidents --- # ADVISORY-AI-005 — Integration Concierge (setup + health + “how-to” that is actually correct) ## Problem Integrations are where tools die: * users ask “how do I integrate X” * assistant answers generically * setup fails because of environment constraints, permissions, webhooks, scopes, retries, or missing prerequisites * no one can debug it later ## Why we do it Integration handling becomes a moat when it is: * deterministic (wizard truth) * auditable (events + actions traced) * self-healing (retries, backfills, health checks) * explainable (precise steps, not generic docs) ## What we ship 1. **Integration Setup Wizard** per provider (GitLab, Jira, Slack, etc.) 2. **Integration Health** dashboard: * last event received * last action executed * failure reasons + next steps * token expiry warnings 3. **Chat-driven guidance** that drives the same wizard backend: * when user asks “how to integrate GitLab,” Stella replies with the exact steps for the instance type, auth mode, and required permissions, and can pre-fill a setup plan. ## How we achieve (modules + UI) ### Backend modules * `StellaOps.Integrations` * Provider contracts: inbound events + outbound actions * Normalization into Stella `Signals` and `Actions` * `StellaOps.Integrations.Reliability` * Webhook dedupe, replay, dead-letter, backfill polling * `StellaOps.AdvisoryAI.Integrations` * Retrieves provider-specific setup templates * Asks only for missing parameters * Produces a “setup checklist” artifact attached to a Run or Integration record ### UI components * `IntegrationWizardComponent` * `IntegrationHealthComponent` * `IntegrationEventLogComponent` (raw payload headers + body stored securely) * `SetupChecklistArtifactComponent` (generated by AdvisoryAI) ## Guardrails * Store inbound webhook payloads for replay/debug, with redaction where required. * Always support reconciliation/backfill (webhooks are never perfectly lossless). * Use least-privilege token scopes by default, with clear permission error guidance. ## KPIs * Time-to-first-successful-event * Integration “healthy” uptime * Setup completion rate without human support --- # ADVISORY-AI-006 — Outcome Analytics (prove ROI with credible attribution) ## Problem AI features are easy to cut in budgeting because value is vague. “It feels faster” doesn’t survive scrutiny. ## Why we do it Outcome analytics makes Stella defensible to leadership and helps prioritize what to automate next. It also becomes a dataset for continuous improvement. ## What we ship * Baseline metrics (before Stella influence): * MTTA, MTTR, escalation count, repeat incidents, deploy failure rate (as relevant) * Attribution model (only count impact when Stella materially contributed): * playbook patch accepted * evidence pack used * policy-gated action executed * memory entry reused * Monthly/weekly impact reports ## How we achieve (modules + UI) ### Backend modules * `StellaOps.Analytics` * Metric computation + cohorts (by service/team/severity) * `StellaOps.AdvisoryAI.Attribution` * Joins outcomes to AI artifacts and actions in the Run ledger * `StellaOps.Reporting` * Scheduled report generation (exportable) ### UI components * `OutcomeDashboardComponent` * `AttributionBreakdownComponent` * `ExecutiveReportExportComponent` ## Guardrails * Avoid vanity metrics (“number of chats”). * Always show confidence/limitations in attribution (correlation vs causation). ## KPIs * MTTR delta (with Stella artifacts vs without) * Repeat incident reduction * Escalation reduction * Automation coverage growth --- ## One unifying implementation note: “AdvisoryAI” should output objects, not prose To make all seven advisories work together, standardize on a small set of **AI-produced artifacts**: * `Plan` (step list with tools/actions) * `EvidencePack` * `DecisionRecord` * `PlaybookPatch` * `IntegrationSetupChecklist` * `RunSummary` (postmortem-ready) Every artifact is: * versioned * evidence-linked * attached to a Run * subject to policy gates when it triggers actions This gives you: * auditability * deterministic replay of the *inputs and tool outputs* * compounding “Ops memory” and “Playbook learning” data --- ## Example: how this maps to “AdvisoryAI module + Chat UI component” Minimum viable architecture that is coherent: **Backend** * `StellaOps.AdvisoryAI.WebService` (orchestrator) * `StellaOps.AdvisoryAI.Evidence` (citations) * `StellaOps.AdvisoryAI.Memory` (structured memory) * `StellaOps.AdvisoryAI.PlaybookLearning` (patch proposals) * `StellaOps.AdvisoryAI.Integrations` (setup guidance + checklists) * `StellaOps.PolicyEngine` + `StellaOps.ActionRegistry` (gated actions) * `StellaOps.RunLedger` + `StellaOps.EvidenceStore` (audit + snapshots) **UI** * AI chat panel that emits/accepts **artifact cards** * Evidence side panel * Approval modal * Diff viewer for patches * Integration wizard + health --- If you want, I can translate these advisories into a single internal spec pack: * database tables (Postgres) * event contracts (signals/actions) * JSON schemas for artifacts/actions * UI navigation and component tree * the first 10 “golden workflows” you should ship with the Workbench