Files

master a21d3dbc1f save progress

2026-01-09 18:27:46 +02:00

15 KiB

Raw Blame History

Below is a cohesive set of 7 product advisories that together define an “AI-native” Stella Ops with defensible moats. Each advisory follows the same structure:

Problem (what hurts today)
Why (why Stella should solve it)
What we ship (capabilities, boundaries)
How we achieve (proposed AdvisoryAI backend modules + key UI components)
Guardrails (safety / trust / determinism)
KPIs (how you prove it works)

I’m assuming your canonical object model already includes Runs (incident/escalation/change investigation runs) and a system-of-record in PostgreSQL with Valkey as a non-authoritative accelerator.

ADVISORY-AI-000 — AdvisoryAI Foundation: Chat + Workbench + Runs (the “AI OS surface”)

Problem

Most “AI in ops” fails because it’s only a chat box. Chat is not:

auditable
repeatable
actionable with guardrails
collaborative (handoffs, approvals, artifacts)

Operators need a place where AI output becomes objects (runs, decisions, patches, evidence packs), not ephemeral text.

Why we do it

This advisory is the substrate for all other moats. Without it, your other features remain demos.

What we ship

AdvisoryAI Orchestrator that can:

read Stella objects (runs, services, policies, evidence)
propose plans
call tools/actions (within policy)
produce structured artifacts (patches, decision records, evidence packs)

AI Workbench UI:

Chat panel for intent
Artifact cards (Run, Playbook Patch, Decision, Evidence Pack)
Run Timeline view (what happened, tool calls, approvals, outputs)

How we achieve (modules + UI)

Backend modules (suggested)

StellaOps.AdvisoryAI.WebService
- Conversation/session orchestration
- Tool routing + action execution requests
- Artifact creation (Run notes, patches, decisions)
StellaOps.AdvisoryAI.Prompting
- Prompt templates versioned + hashed
- Guarded system prompts per “mode”
StellaOps.AdvisoryAI.Tools
- Tool contracts (read-only queries, action requests)
StellaOps.AdvisoryAI.Eval
- Regression tests for tool correctness + safety

UI components

AiChatPanelComponent
AiArtifactCardComponent (Run/Decision/Patch/Evidence Pack)
RunTimelineComponent (with “AI steps” and “human steps”)
ModeSelectorComponent (Analyst / Operator / Autopilot)

Canonical flow

User intent (chat) 
  -> AdvisoryAI proposes plan (steps)
  -> executes read-only tools
  -> generates artifact(s)
  -> requests approvals for risky actions
  -> records everything on Run timeline

Guardrails

Every AI interaction writes to a Run (or attaches to an existing Run).
Prompt templates are versioned + hashed.
Tool calls and outputs are persisted (for audit and replay).

KPIs

% AI sessions attached to Runs
“Time to first useful artifact”
Operator adoption (weekly active users of Workbench)

ADVISORY-AI-001 — Evidence-First Outputs (trust-by-construction)

Problem

In ops, an answer without evidence is a liability. LLMs are persuasive even when wrong. Operators waste time verifying or, worse, act on incorrect claims.

Why we do it

Evidence-first output is the trust prerequisite for:

automation
playbook learning
org memory
executive reporting

What we ship

A Claim → Evidence constraint:
- Each material claim must be backed by an EvidenceRef (query snapshot, ticket, pipeline run, commit, config state).
An Evidence Pack artifact:
- A shareable bundle of evidence for an incident/change/review.

How we achieve (modules + UI)

Backend modules

StellaOps.AdvisoryAI.Evidence
- Claim extraction from model output
- Evidence retrieval + snapshotting
- Citation enforcement (or downgrade claim confidence)
StellaOps.EvidenceStore
- Immutable (or content-addressed) snapshots
- Hashes, timestamps, query parameters

UI components

EvidenceSidePanelComponent (opens from inline citations)
EvidencePackViewerComponent
ConfidenceBadgeComponent (Verified / Inferred / Unknown)

Implementation pattern

For each answer:
1. Draft response
2. Extract claims
3. Attach evidence refs
4. If evidence missing: label as uncertain + propose verification steps

Guardrails

If evidence is missing, Stella must not assert certainty.
Evidence snapshots must capture:
- query inputs
- time range
- raw result (or hash + storage pointer)

KPIs

Citation coverage (% of answers with evidence refs)
Reduced back-and-forth (“how do you know?” rate)
Adoption of automation after evidence-first rollout

ADVISORY-AI-002 — Policy-Aware Automation (safe actions, not just suggestions)

Problem

The main blocker to “AI that acts” is governance:

wrong environment
insufficient permission
missing approvals
non-idempotent actions
unclear accountability

Why we do it

If Stella can’t safely execute actions, it will remain a read-only assistant. Policy-aware automation is a hard moat because it requires real engineering discipline and operational maturity.

What we ship

A typed Action Registry:
- schemas, risk levels, idempotency, rollback/compensation
A Policy decision point (PDP) before any action:
- allow / allow-with-approvals / deny
An Approval workflow linked to Runs

How we achieve (modules + UI)

Backend modules

StellaOps.ActionRegistry
- Action definitions + schemas + risk metadata
StellaOps.PolicyEngine
- Rules: environment protections, freeze windows, role constraints
StellaOps.AdvisoryAI.Automation
- Converts intent → action proposals
- Submits action requests after approvals
StellaOps.RunLedger
- Every action request + result is a ledger entry

UI components

ActionProposalCardComponent
ApprovalModalComponent (scoped approval: this action/this run/this window)
PolicyExplanationComponent (human-readable “why allowed/denied”)
RollbackPanelComponent

Guardrails

Default: propose actions; only auto-execute in explicitly configured “Autopilot scopes.”
Every action must support:
- idempotency key
- audit fields (why, ticket/run linkage)
- reversible/compensating action where feasible

KPIs

% actions proposed vs executed
“Policy prevented incident” count
Approval latency and action success rate

ADVISORY-AI-003 — Ops Memory (structured, durable, queryable)

Problem

Teams repeat incidents because knowledge lives in:

chat logs
tribal memory
scattered tickets
unwritten heuristics

Chat history is not an operational knowledge base: it’s unstructured and hard to reuse safely.

Why we do it

Ops memory reduces repeat work and accelerates diagnosis. It also becomes a defensible dataset because it’s tied to your Runs, artifacts, and outcomes.

What we ship

A set of typed memory objects (not messages):

DecisionRecord
KnownIssue
Tactic
Constraint
PostmortemSummary

Memory is written on:

Run closure
approvals (policy events)
explicit “save as org memory” actions

How we achieve (modules + UI)

Backend modules

StellaOps.AdvisoryAI.Memory
- Write: extract structured memory from run artifacts
- Read: retrieve memory relevant to current context (service/env/symptoms)
- Conflict handling: “superseded by”, timestamps, confidence
StellaOps.MemoryStore (Postgres tables + full-text index as needed)

UI components

MemoryPanelComponent (contextual suggestions during a run)
MemoryBrowserComponent (search + filters)
MemoryDiffComponent (when superseding prior memory)

Guardrails

Memory entries have:
- scope (service/env/team)
- confidence (verified vs anecdotal)
- review/expiry policies for tactics/constraints
Never “learn” from unresolved or low-confidence runs by default.

KPIs

Repeat incident rate reduction
Time-to-diagnosis delta when memory exists
Memory reuse rate inside Runs

ADVISORY-AI-004 — Playbook Learning (Run → Patch → Approved Playbook)

Problem

Runbooks/playbooks drift. Operators improvise. The playbook never improves, and the organization pays the same “tuition” repeatedly.

Why we do it

Playbook learning is the compounding loop that turns daily operations into a proprietary advantage. Competitors can generate playbooks; they struggle to continuously improve them from real run traces with review + governance.

What we ship

Versioned playbooks as structured objects
Playbook Patch proposals generated from Run traces:
- coverage patches, repair patches, optimization patches, safety patches, detection patches
Owner review + approval workflow

How we achieve (modules + UI)

Backend modules

StellaOps.Playbooks
- Playbook schema + versioning
StellaOps.AdvisoryAI.PlaybookLearning
- Extract “what we did” from Run timeline
- Compare to playbook steps
- Propose a patch with evidence links
StellaOps.DiffService
- Human-friendly diff output for UI

UI components

PlaybookPatchCardComponent
DiffViewerComponent (Monaco diff or equivalent)
PlaybookApprovalFlowComponent
PlaybookCoverageHeatmapComponent (optional, later)

Guardrails

Never auto-edit canonical playbooks; only patches + review.
Require evidence links for each proposed step.
Prevent one-off contamination by marking patches as:
- “generalizable” vs “context-specific”

KPIs

% incidents with a playbook
Patch acceptance rate
MTTR improvement for playbook-backed incidents

ADVISORY-AI-005 — Integration Concierge (setup + health + “how-to” that is actually correct)

Problem

Integrations are where tools die:

users ask “how do I integrate X”
assistant answers generically
setup fails because of environment constraints, permissions, webhooks, scopes, retries, or missing prerequisites
no one can debug it later

Why we do it

Integration handling becomes a moat when it is:

deterministic (wizard truth)
auditable (events + actions traced)
self-healing (retries, backfills, health checks)
explainable (precise steps, not generic docs)

What we ship

Integration Setup Wizard per provider (GitLab, Jira, Slack, etc.)
Integration Health dashboard:

last event received
last action executed
failure reasons + next steps
token expiry warnings

Chat-driven guidance that drives the same wizard backend:

when user asks “how to integrate GitLab,” Stella replies with the exact steps for the instance type, auth mode, and required permissions, and can pre-fill a setup plan.

How we achieve (modules + UI)

Backend modules

StellaOps.Integrations
- Provider contracts: inbound events + outbound actions
- Normalization into Stella Signals and Actions
StellaOps.Integrations.Reliability
- Webhook dedupe, replay, dead-letter, backfill polling
StellaOps.AdvisoryAI.Integrations
- Retrieves provider-specific setup templates
- Asks only for missing parameters
- Produces a “setup checklist” artifact attached to a Run or Integration record

UI components

IntegrationWizardComponent
IntegrationHealthComponent
IntegrationEventLogComponent (raw payload headers + body stored securely)
SetupChecklistArtifactComponent (generated by AdvisoryAI)

Guardrails

Store inbound webhook payloads for replay/debug, with redaction where required.
Always support reconciliation/backfill (webhooks are never perfectly lossless).
Use least-privilege token scopes by default, with clear permission error guidance.

KPIs

Time-to-first-successful-event
Integration “healthy” uptime
Setup completion rate without human support

ADVISORY-AI-006 — Outcome Analytics (prove ROI with credible attribution)

Problem

AI features are easy to cut in budgeting because value is vague. “It feels faster” doesn’t survive scrutiny.

Why we do it

Outcome analytics makes Stella defensible to leadership and helps prioritize what to automate next. It also becomes a dataset for continuous improvement.

What we ship

Baseline metrics (before Stella influence):
- MTTA, MTTR, escalation count, repeat incidents, deploy failure rate (as relevant)
Attribution model (only count impact when Stella materially contributed):
- playbook patch accepted
- evidence pack used
- policy-gated action executed
- memory entry reused
Monthly/weekly impact reports

How we achieve (modules + UI)

Backend modules

StellaOps.Analytics
- Metric computation + cohorts (by service/team/severity)
StellaOps.AdvisoryAI.Attribution
- Joins outcomes to AI artifacts and actions in the Run ledger
StellaOps.Reporting
- Scheduled report generation (exportable)

UI components

OutcomeDashboardComponent
AttributionBreakdownComponent
ExecutiveReportExportComponent

Guardrails

Avoid vanity metrics (“number of chats”).
Always show confidence/limitations in attribution (correlation vs causation).

KPIs

MTTR delta (with Stella artifacts vs without)
Repeat incident reduction
Escalation reduction
Automation coverage growth

One unifying implementation note: “AdvisoryAI” should output objects, not prose

To make all seven advisories work together, standardize on a small set of AI-produced artifacts:

Plan (step list with tools/actions)
EvidencePack
DecisionRecord
PlaybookPatch
IntegrationSetupChecklist
RunSummary (postmortem-ready)

Every artifact is:

versioned
evidence-linked
attached to a Run
subject to policy gates when it triggers actions

This gives you:

auditability
deterministic replay of the inputs and tool outputs
compounding “Ops memory” and “Playbook learning” data

Example: how this maps to “AdvisoryAI module + Chat UI component”

Minimum viable architecture that is coherent:

Backend

StellaOps.AdvisoryAI.WebService (orchestrator)
StellaOps.AdvisoryAI.Evidence (citations)
StellaOps.AdvisoryAI.Memory (structured memory)
StellaOps.AdvisoryAI.PlaybookLearning (patch proposals)
StellaOps.AdvisoryAI.Integrations (setup guidance + checklists)
StellaOps.PolicyEngine + StellaOps.ActionRegistry (gated actions)
StellaOps.RunLedger + StellaOps.EvidenceStore (audit + snapshots)

AI chat panel that emits/accepts artifact cards
Evidence side panel
Approval modal
Diff viewer for patches
Integration wizard + health

If you want, I can translate these advisories into a single internal spec pack:

database tables (Postgres)
event contracts (signals/actions)
JSON schemas for artifacts/actions
UI navigation and component tree
the first 10 “golden workflows” you should ship with the Workbench

15 KiB Raw Blame History Unescape Escape

ADVISORY-AI-000 — AdvisoryAI Foundation: Chat + Workbench + Runs (the “AI OS surface”)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules (suggested)

UI components

Canonical flow

Guardrails

KPIs

ADVISORY-AI-001 — Evidence-First Outputs (trust-by-construction)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules

UI components

Implementation pattern

Guardrails

KPIs

ADVISORY-AI-002 — Policy-Aware Automation (safe actions, not just suggestions)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules

UI components

Guardrails

KPIs

ADVISORY-AI-003 — Ops Memory (structured, durable, queryable)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules

UI components

Guardrails

KPIs

ADVISORY-AI-004 — Playbook Learning (Run → Patch → Approved Playbook)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules

UI components

Guardrails

KPIs

ADVISORY-AI-005 — Integration Concierge (setup + health + “how-to” that is actually correct)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules

UI components

Guardrails

KPIs

ADVISORY-AI-006 — Outcome Analytics (prove ROI with credible attribution)

Problem

Why we do it

What we ship

How we achieve (modules + UI)

Backend modules

UI components

Guardrails

KPIs

One unifying implementation note: “AdvisoryAI” should output objects, not prose

Example: how this maps to “AdvisoryAI module + Chat UI component”

15 KiB

Raw Blame History