Files
git.stella-ops.org/docs/product-advisories/26-Dec-2025 - AI Assistant as Proof-Carrying Evidence Engine.md
StellaOps Bot 7792749bb4 feat: Add archived advisories and implement smart-diff as a core evidence primitive
- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints.
- Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability.
- Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs.
- Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making.
- Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps.
- Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
2025-12-26 13:01:43 +02:00

6.7 KiB

AI Assistant as Proof-Carrying Evidence Engine

Status: ANALYZED - Sprints Created Date: 2025-12-26 Type: Strategic AI Feature Advisory Implementation Sprints: SPRINT_20251226_015 through 019


Executive Summary

This advisory proposes building Stella Ops AI as a proof-carrying assistant that:

  • Copies best UX outcomes from competitors (Snyk: speed to fix; JFrog/Docker: context reduces noise)
  • Keeps authority in deterministic, replayable engines and signed evidence packs
  • Extends into Stella Ops' moats: lattice merge semantics, deterministic replay, sovereign/offline cryptography

Advisory Content

1) What to Copy: Competitor AI Patterns

A. Snyk-style: "developer-time" intelligence

  • Fast, developer-local explanation of "why this is a problem" and "what change fixes it"
  • Reachability-informed prioritization (not just CVSS)
  • Autofix PRs where safe

Stella Ops takeaway: Make "time-to-understanding" and "time-to-first-fix" first-class KPIs.

B. JFrog-style: contextual exploitability filtering

  • "Is this vulnerability exploitable in this app?" filtering
  • Runtime loaded-code validation to reduce noise

Stella Ops takeaway: Treat exploitability as an evidence question; label uncertainty explicitly.

C. Aqua-style: AI-guided remediation

  • Prescriptive remediation steps in human language (and as patches/PRs)
  • Integrate into CI/CD and ticketing

Stella Ops takeaway: The assistant must be operational: PR creation, change plans, risk acceptance packages.

D. Docker Scout-style: operational context

  • Use runtime telemetry to prioritize vulnerabilities that can actually bite

Stella Ops takeaway: Runtime evidence as attestable evidence beats "black box AI prioritization."

E. Grype/Trivy reality: deterministic scanners win trust

  • Strong data hygiene, VEX ingestion, deterministic outputs

Stella Ops takeaway: AI layer must never undermine deterministic trust; must be additive, signed, replayable.

2) Where Competitors Are Weak (Stella Ops Openings)

  1. Audit-grade reproducibility: AI explanations often non-replayable
  2. Offline/sovereign operations: Air-gapped + local inference rare
  3. Proof-carrying verdicts: Most risk scores are opaque
  4. Merge semantics for VEX: Few ship policy-controlled lattice merge
  5. Time-travel replay + delta verdicts: Rare as first-class artifacts
  6. Network effects for proofs: Proof-market ledger concepts largely absent

3) Core Principle: "AI is an assistant; evidence is the authority"

Every AI output must be either:

  • Pure suggestion (non-authoritative), or
  • Evidence-backed (authoritative only when evidence pack suffices)

4) Proposed Features

Feature 1: Zastava Companion

Evidence-grounded explainability answering: What is it? Why it matters? What evidence supports?

  • Output anchored to evidence nodes
  • OCI-attached "Explanation Attestation" with hashes + model digest

Feature 2: Exploitability Confidence Engine

  • Deterministic classification: Confirmed/Likely/Unknown/Likely Not/Not exploitable
  • AI proposes "cheapest additional evidence" to reduce Unknown

Feature 3: Remedy Autopilot

  • AI generates remediation plans
  • Automated PRs with reproducible build, tests, SBOM delta, signed delta verdict
  • Fallback to "suggestion-only" if build/tests fail

Feature 4: Auto-VEX Drafting

  • Generate VEX drafts from evidence
  • Lattice-aware merge preview

Feature 5: Advisory Ingestion Copilot

  • Convert unstructured advisories to structured records
  • Cross-check multiple sources, require corroboration for "trusted" status

Feature 6: Policy Studio Copilot

  • NL → lattice rules
  • Test case generation
  • Compile to deterministic policy with signed snapshots

5) Architecture

  • scanner.webservice: lattice merges, deterministic verdict engine (authoritative)
  • zastava.webservice (new): LLM inference + RAG; non-authoritative suggestions
  • Feedser/Vexer: immutable feed snapshots for replay
  • Postgres: system of record
  • Valkey: ephemeral caching (never authoritative)
  • Offline profile: Postgres-only + local inference bundle

6) Deterministic, Replayable AI

Record and hash:

  • Prompt template version
  • Retrieved evidence node IDs + content hashes
  • Model identifier + weights digest
  • Decoding parameters (temperature=0, fixed seed)

Emit as OCI-attached attestation: AIExplanation, RemediationPlan, VEXDraft, PolicyDraft.

7) Roadmap

  • Phase 1: Deterministic confidence states + Zastava "Explain with evidence"
  • Phase 2: Remedy Autopilot + Auto-VEX drafting
  • Phase 3: Sovereign/offline AI bundle
  • Phase 4: Proof-market + trust economics

8) KPIs

  • Mean time to triage (MTTT)
  • Mean time to remediate (MTTR)
  • Noise rate (% findings that end up "not exploitable")
  • "Unknown" reduction speed
  • Reproducibility (% AI artifacts replayed to identical output)
  • Audit extraction time

9) Risks and Mitigations

  1. Hallucinations → enforce evidence citation
  2. Prompt injection → sanitize, isolate untrusted text
  3. Data exfiltration → offline profile, strict egress
  4. Bad patches → require build+tests+policy gates
  5. Model drift → pin model digests, snapshot outputs

Implementation Assessment

Existing Infrastructure (Substantial)

Component Coverage Location
AdvisoryAI Pipeline 90% src/AdvisoryAI/
Guardrail Pipeline 100% AdvisoryAI/Guardrails/
Evidence Retrieval 80% SBOM context, vector/structured retrieval
TrustLatticeEngine 100% Policy/TrustLattice/
SourceTrustScoreCalculator 100% VexLens/Trust/
Remediation Hints 30% Policy.Unknowns/Services/
ProofChain/Attestations 100% Attestor/ProofChain/
DeltaVerdict 100% StellaOps.DeltaVerdict/
Offline/Airgap 80% Various modules

Gaps Requiring New Development

  1. LLM-generated explanations - Feature 1
  2. Remedy Autopilot with PRs - Feature 3
  3. Policy NL→Rules - Feature 6
  4. AI artifact attestation types - All features
  5. Sovereign/offline LLM - Phase 3

Created Sprints

Sprint Topic Tasks
SPRINT_20251226_015_AI_zastava_companion Explanation generation 21 tasks
SPRINT_20251226_016_AI_remedy_autopilot Automated remediation PRs 26 tasks
SPRINT_20251226_017_AI_policy_copilot NL→lattice rules 26 tasks
SPRINT_20251226_018_AI_attestations AI artifact attestation types 23 tasks
SPRINT_20251226_019_AI_offline_inference Sovereign/offline AI 26 tasks

Total: 5 sprints, 122 tasks

Archived Advisory

  • "Weighted Confidence for VEX Sources" → moved to archived/2025-12-26-vex-scoring/ (Substantially implemented in VexLens SourceTrustScoreCalculator)