# AI Assistant as Proof-Carrying Evidence Engine **Status:** ANALYZED - Sprints Created **Date:** 2025-12-26 **Type:** Strategic AI Feature Advisory **Implementation Sprints:** SPRINT_20251226_015 through 019 --- ## Executive Summary This advisory proposes building Stella Ops AI as a **proof-carrying assistant** that: - Copies best UX outcomes from competitors (Snyk: speed to fix; JFrog/Docker: context reduces noise) - Keeps authority in deterministic, replayable engines and signed evidence packs - Extends into Stella Ops' moats: lattice merge semantics, deterministic replay, sovereign/offline cryptography ## Advisory Content ### 1) What to Copy: Competitor AI Patterns #### A. Snyk-style: "developer-time" intelligence - Fast, developer-local explanation of "why this is a problem" and "what change fixes it" - Reachability-informed prioritization (not just CVSS) - Autofix PRs where safe **Stella Ops takeaway:** Make "time-to-understanding" and "time-to-first-fix" first-class KPIs. #### B. JFrog-style: contextual exploitability filtering - "Is this vulnerability exploitable in *this* app?" filtering - Runtime loaded-code validation to reduce noise **Stella Ops takeaway:** Treat exploitability as an evidence question; label uncertainty explicitly. #### C. Aqua-style: AI-guided remediation - Prescriptive remediation steps in human language (and as patches/PRs) - Integrate into CI/CD and ticketing **Stella Ops takeaway:** The assistant must be operational: PR creation, change plans, risk acceptance packages. #### D. Docker Scout-style: operational context - Use runtime telemetry to prioritize vulnerabilities that can actually bite **Stella Ops takeaway:** Runtime evidence as attestable evidence beats "black box AI prioritization." #### E. Grype/Trivy reality: deterministic scanners win trust - Strong data hygiene, VEX ingestion, deterministic outputs **Stella Ops takeaway:** AI layer must never undermine deterministic trust; must be additive, signed, replayable. ### 2) Where Competitors Are Weak (Stella Ops Openings) 1. **Audit-grade reproducibility:** AI explanations often non-replayable 2. **Offline/sovereign operations:** Air-gapped + local inference rare 3. **Proof-carrying verdicts:** Most risk scores are opaque 4. **Merge semantics for VEX:** Few ship policy-controlled lattice merge 5. **Time-travel replay + delta verdicts:** Rare as first-class artifacts 6. **Network effects for proofs:** Proof-market ledger concepts largely absent ### 3) Core Principle: "AI is an assistant; evidence is the authority" **Every AI output must be either:** - **Pure suggestion** (non-authoritative), or - **Evidence-backed** (authoritative only when evidence pack suffices) ### 4) Proposed Features #### Feature 1: Zastava Companion Evidence-grounded explainability answering: What is it? Why it matters? What evidence supports? - Output anchored to evidence nodes - OCI-attached "Explanation Attestation" with hashes + model digest #### Feature 2: Exploitability Confidence Engine - Deterministic classification: Confirmed/Likely/Unknown/Likely Not/Not exploitable - AI proposes "cheapest additional evidence" to reduce Unknown #### Feature 3: Remedy Autopilot - AI generates remediation plans - Automated PRs with reproducible build, tests, SBOM delta, signed delta verdict - Fallback to "suggestion-only" if build/tests fail #### Feature 4: Auto-VEX Drafting - Generate VEX drafts from evidence - Lattice-aware merge preview #### Feature 5: Advisory Ingestion Copilot - Convert unstructured advisories to structured records - Cross-check multiple sources, require corroboration for "trusted" status #### Feature 6: Policy Studio Copilot - NL → lattice rules - Test case generation - Compile to deterministic policy with signed snapshots ### 5) Architecture - **scanner.webservice:** lattice merges, deterministic verdict engine (authoritative) - **zastava.webservice (new):** LLM inference + RAG; non-authoritative suggestions - **Feedser/Vexer:** immutable feed snapshots for replay - **Postgres:** system of record - **Valkey:** ephemeral caching (never authoritative) - **Offline profile:** Postgres-only + local inference bundle ### 6) Deterministic, Replayable AI Record and hash: - Prompt template version - Retrieved evidence node IDs + content hashes - Model identifier + weights digest - Decoding parameters (temperature=0, fixed seed) Emit as OCI-attached attestation: AIExplanation, RemediationPlan, VEXDraft, PolicyDraft. ### 7) Roadmap - **Phase 1:** Deterministic confidence states + Zastava "Explain with evidence" - **Phase 2:** Remedy Autopilot + Auto-VEX drafting - **Phase 3:** Sovereign/offline AI bundle - **Phase 4:** Proof-market + trust economics ### 8) KPIs - Mean time to triage (MTTT) - Mean time to remediate (MTTR) - Noise rate (% findings that end up "not exploitable") - "Unknown" reduction speed - Reproducibility (% AI artifacts replayed to identical output) - Audit extraction time ### 9) Risks and Mitigations 1. Hallucinations → enforce evidence citation 2. Prompt injection → sanitize, isolate untrusted text 3. Data exfiltration → offline profile, strict egress 4. Bad patches → require build+tests+policy gates 5. Model drift → pin model digests, snapshot outputs --- ## Implementation Assessment ### Existing Infrastructure (Substantial) | Component | Coverage | Location | |-----------|----------|----------| | AdvisoryAI Pipeline | 90% | `src/AdvisoryAI/` | | Guardrail Pipeline | 100% | `AdvisoryAI/Guardrails/` | | Evidence Retrieval | 80% | SBOM context, vector/structured retrieval | | TrustLatticeEngine | 100% | `Policy/TrustLattice/` | | SourceTrustScoreCalculator | 100% | `VexLens/Trust/` | | Remediation Hints | 30% | `Policy.Unknowns/Services/` | | ProofChain/Attestations | 100% | `Attestor/ProofChain/` | | DeltaVerdict | 100% | `StellaOps.DeltaVerdict/` | | Offline/Airgap | 80% | Various modules | ### Gaps Requiring New Development 1. **LLM-generated explanations** - Feature 1 2. **Remedy Autopilot with PRs** - Feature 3 3. **Policy NL→Rules** - Feature 6 4. **AI artifact attestation types** - All features 5. **Sovereign/offline LLM** - Phase 3 ### Created Sprints | Sprint | Topic | Tasks | |--------|-------|-------| | SPRINT_20251226_015_AI_zastava_companion | Explanation generation | 21 tasks | | SPRINT_20251226_016_AI_remedy_autopilot | Automated remediation PRs | 26 tasks | | SPRINT_20251226_017_AI_policy_copilot | NL→lattice rules | 26 tasks | | SPRINT_20251226_018_AI_attestations | AI artifact attestation types | 23 tasks | | SPRINT_20251226_019_AI_offline_inference | Sovereign/offline AI | 26 tasks | **Total:** 5 sprints, 122 tasks ### Archived Advisory - "Weighted Confidence for VEX Sources" → moved to `archived/2025-12-26-vex-scoring/` (Substantially implemented in VexLens SourceTrustScoreCalculator)