- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints. - Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability. - Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs. - Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making. - Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps. - Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
6.7 KiB
AI Assistant as Proof-Carrying Evidence Engine
Status: ANALYZED - Sprints Created Date: 2025-12-26 Type: Strategic AI Feature Advisory Implementation Sprints: SPRINT_20251226_015 through 019
Executive Summary
This advisory proposes building Stella Ops AI as a proof-carrying assistant that:
- Copies best UX outcomes from competitors (Snyk: speed to fix; JFrog/Docker: context reduces noise)
- Keeps authority in deterministic, replayable engines and signed evidence packs
- Extends into Stella Ops' moats: lattice merge semantics, deterministic replay, sovereign/offline cryptography
Advisory Content
1) What to Copy: Competitor AI Patterns
A. Snyk-style: "developer-time" intelligence
- Fast, developer-local explanation of "why this is a problem" and "what change fixes it"
- Reachability-informed prioritization (not just CVSS)
- Autofix PRs where safe
Stella Ops takeaway: Make "time-to-understanding" and "time-to-first-fix" first-class KPIs.
B. JFrog-style: contextual exploitability filtering
- "Is this vulnerability exploitable in this app?" filtering
- Runtime loaded-code validation to reduce noise
Stella Ops takeaway: Treat exploitability as an evidence question; label uncertainty explicitly.
C. Aqua-style: AI-guided remediation
- Prescriptive remediation steps in human language (and as patches/PRs)
- Integrate into CI/CD and ticketing
Stella Ops takeaway: The assistant must be operational: PR creation, change plans, risk acceptance packages.
D. Docker Scout-style: operational context
- Use runtime telemetry to prioritize vulnerabilities that can actually bite
Stella Ops takeaway: Runtime evidence as attestable evidence beats "black box AI prioritization."
E. Grype/Trivy reality: deterministic scanners win trust
- Strong data hygiene, VEX ingestion, deterministic outputs
Stella Ops takeaway: AI layer must never undermine deterministic trust; must be additive, signed, replayable.
2) Where Competitors Are Weak (Stella Ops Openings)
- Audit-grade reproducibility: AI explanations often non-replayable
- Offline/sovereign operations: Air-gapped + local inference rare
- Proof-carrying verdicts: Most risk scores are opaque
- Merge semantics for VEX: Few ship policy-controlled lattice merge
- Time-travel replay + delta verdicts: Rare as first-class artifacts
- Network effects for proofs: Proof-market ledger concepts largely absent
3) Core Principle: "AI is an assistant; evidence is the authority"
Every AI output must be either:
- Pure suggestion (non-authoritative), or
- Evidence-backed (authoritative only when evidence pack suffices)
4) Proposed Features
Feature 1: Zastava Companion
Evidence-grounded explainability answering: What is it? Why it matters? What evidence supports?
- Output anchored to evidence nodes
- OCI-attached "Explanation Attestation" with hashes + model digest
Feature 2: Exploitability Confidence Engine
- Deterministic classification: Confirmed/Likely/Unknown/Likely Not/Not exploitable
- AI proposes "cheapest additional evidence" to reduce Unknown
Feature 3: Remedy Autopilot
- AI generates remediation plans
- Automated PRs with reproducible build, tests, SBOM delta, signed delta verdict
- Fallback to "suggestion-only" if build/tests fail
Feature 4: Auto-VEX Drafting
- Generate VEX drafts from evidence
- Lattice-aware merge preview
Feature 5: Advisory Ingestion Copilot
- Convert unstructured advisories to structured records
- Cross-check multiple sources, require corroboration for "trusted" status
Feature 6: Policy Studio Copilot
- NL → lattice rules
- Test case generation
- Compile to deterministic policy with signed snapshots
5) Architecture
- scanner.webservice: lattice merges, deterministic verdict engine (authoritative)
- zastava.webservice (new): LLM inference + RAG; non-authoritative suggestions
- Feedser/Vexer: immutable feed snapshots for replay
- Postgres: system of record
- Valkey: ephemeral caching (never authoritative)
- Offline profile: Postgres-only + local inference bundle
6) Deterministic, Replayable AI
Record and hash:
- Prompt template version
- Retrieved evidence node IDs + content hashes
- Model identifier + weights digest
- Decoding parameters (temperature=0, fixed seed)
Emit as OCI-attached attestation: AIExplanation, RemediationPlan, VEXDraft, PolicyDraft.
7) Roadmap
- Phase 1: Deterministic confidence states + Zastava "Explain with evidence"
- Phase 2: Remedy Autopilot + Auto-VEX drafting
- Phase 3: Sovereign/offline AI bundle
- Phase 4: Proof-market + trust economics
8) KPIs
- Mean time to triage (MTTT)
- Mean time to remediate (MTTR)
- Noise rate (% findings that end up "not exploitable")
- "Unknown" reduction speed
- Reproducibility (% AI artifacts replayed to identical output)
- Audit extraction time
9) Risks and Mitigations
- Hallucinations → enforce evidence citation
- Prompt injection → sanitize, isolate untrusted text
- Data exfiltration → offline profile, strict egress
- Bad patches → require build+tests+policy gates
- Model drift → pin model digests, snapshot outputs
Implementation Assessment
Existing Infrastructure (Substantial)
| Component | Coverage | Location |
|---|---|---|
| AdvisoryAI Pipeline | 90% | src/AdvisoryAI/ |
| Guardrail Pipeline | 100% | AdvisoryAI/Guardrails/ |
| Evidence Retrieval | 80% | SBOM context, vector/structured retrieval |
| TrustLatticeEngine | 100% | Policy/TrustLattice/ |
| SourceTrustScoreCalculator | 100% | VexLens/Trust/ |
| Remediation Hints | 30% | Policy.Unknowns/Services/ |
| ProofChain/Attestations | 100% | Attestor/ProofChain/ |
| DeltaVerdict | 100% | StellaOps.DeltaVerdict/ |
| Offline/Airgap | 80% | Various modules |
Gaps Requiring New Development
- LLM-generated explanations - Feature 1
- Remedy Autopilot with PRs - Feature 3
- Policy NL→Rules - Feature 6
- AI artifact attestation types - All features
- Sovereign/offline LLM - Phase 3
Created Sprints
| Sprint | Topic | Tasks |
|---|---|---|
| SPRINT_20251226_015_AI_zastava_companion | Explanation generation | 21 tasks |
| SPRINT_20251226_016_AI_remedy_autopilot | Automated remediation PRs | 26 tasks |
| SPRINT_20251226_017_AI_policy_copilot | NL→lattice rules | 26 tasks |
| SPRINT_20251226_018_AI_attestations | AI artifact attestation types | 23 tasks |
| SPRINT_20251226_019_AI_offline_inference | Sovereign/offline AI | 26 tasks |
Total: 5 sprints, 122 tasks
Archived Advisory
- "Weighted Confidence for VEX Sources" → moved to
archived/2025-12-26-vex-scoring/(Substantially implemented in VexLens SourceTrustScoreCalculator)