Files

StellaOps Bot 7792749bb4 feat: Add archived advisories and implement smart-diff as a core evidence primitive

- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints.
- Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability.
- Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs.
- Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making.
- Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps.
- Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.

2025-12-26 13:01:43 +02:00

6.7 KiB

Raw Blame History

AI Assistant as Proof-Carrying Evidence Engine

Status: ANALYZED - Sprints Created Date: 2025-12-26 Type: Strategic AI Feature Advisory Implementation Sprints: SPRINT_20251226_015 through 019

Executive Summary

This advisory proposes building Stella Ops AI as a proof-carrying assistant that:

Copies best UX outcomes from competitors (Snyk: speed to fix; JFrog/Docker: context reduces noise)
Keeps authority in deterministic, replayable engines and signed evidence packs
Extends into Stella Ops' moats: lattice merge semantics, deterministic replay, sovereign/offline cryptography

Advisory Content

1) What to Copy: Competitor AI Patterns

A. Snyk-style: "developer-time" intelligence

Fast, developer-local explanation of "why this is a problem" and "what change fixes it"
Reachability-informed prioritization (not just CVSS)
Autofix PRs where safe

Stella Ops takeaway: Make "time-to-understanding" and "time-to-first-fix" first-class KPIs.

B. JFrog-style: contextual exploitability filtering

"Is this vulnerability exploitable in this app?" filtering
Runtime loaded-code validation to reduce noise

Stella Ops takeaway: Treat exploitability as an evidence question; label uncertainty explicitly.

C. Aqua-style: AI-guided remediation

Prescriptive remediation steps in human language (and as patches/PRs)
Integrate into CI/CD and ticketing

Stella Ops takeaway: The assistant must be operational: PR creation, change plans, risk acceptance packages.

D. Docker Scout-style: operational context

Use runtime telemetry to prioritize vulnerabilities that can actually bite

Stella Ops takeaway: Runtime evidence as attestable evidence beats "black box AI prioritization."

E. Grype/Trivy reality: deterministic scanners win trust

Strong data hygiene, VEX ingestion, deterministic outputs

Stella Ops takeaway: AI layer must never undermine deterministic trust; must be additive, signed, replayable.

2) Where Competitors Are Weak (Stella Ops Openings)

Audit-grade reproducibility: AI explanations often non-replayable
Offline/sovereign operations: Air-gapped + local inference rare
Proof-carrying verdicts: Most risk scores are opaque
Merge semantics for VEX: Few ship policy-controlled lattice merge
Time-travel replay + delta verdicts: Rare as first-class artifacts
Network effects for proofs: Proof-market ledger concepts largely absent

3) Core Principle: "AI is an assistant; evidence is the authority"

Every AI output must be either:

Pure suggestion (non-authoritative), or
Evidence-backed (authoritative only when evidence pack suffices)

4) Proposed Features

Feature 1: Zastava Companion

Evidence-grounded explainability answering: What is it? Why it matters? What evidence supports?

Output anchored to evidence nodes
OCI-attached "Explanation Attestation" with hashes + model digest

Feature 2: Exploitability Confidence Engine

Deterministic classification: Confirmed/Likely/Unknown/Likely Not/Not exploitable
AI proposes "cheapest additional evidence" to reduce Unknown

Feature 3: Remedy Autopilot

AI generates remediation plans
Automated PRs with reproducible build, tests, SBOM delta, signed delta verdict
Fallback to "suggestion-only" if build/tests fail

Feature 4: Auto-VEX Drafting

Generate VEX drafts from evidence
Lattice-aware merge preview

Feature 5: Advisory Ingestion Copilot

Convert unstructured advisories to structured records
Cross-check multiple sources, require corroboration for "trusted" status

Feature 6: Policy Studio Copilot

NL → lattice rules
Test case generation
Compile to deterministic policy with signed snapshots

5) Architecture

scanner.webservice: lattice merges, deterministic verdict engine (authoritative)
zastava.webservice (new): LLM inference + RAG; non-authoritative suggestions
Feedser/Vexer: immutable feed snapshots for replay
Postgres: system of record
Valkey: ephemeral caching (never authoritative)
Offline profile: Postgres-only + local inference bundle

6) Deterministic, Replayable AI

Record and hash:

Prompt template version
Retrieved evidence node IDs + content hashes
Model identifier + weights digest
Decoding parameters (temperature=0, fixed seed)

Emit as OCI-attached attestation: AIExplanation, RemediationPlan, VEXDraft, PolicyDraft.

7) Roadmap

Phase 1: Deterministic confidence states + Zastava "Explain with evidence"
Phase 2: Remedy Autopilot + Auto-VEX drafting
Phase 3: Sovereign/offline AI bundle
Phase 4: Proof-market + trust economics

8) KPIs

Mean time to triage (MTTT)
Mean time to remediate (MTTR)
Noise rate (% findings that end up "not exploitable")
"Unknown" reduction speed
Reproducibility (% AI artifacts replayed to identical output)
Audit extraction time

9) Risks and Mitigations

Hallucinations → enforce evidence citation
Prompt injection → sanitize, isolate untrusted text
Data exfiltration → offline profile, strict egress
Bad patches → require build+tests+policy gates
Model drift → pin model digests, snapshot outputs

Implementation Assessment

Existing Infrastructure (Substantial)

Component	Coverage	Location
AdvisoryAI Pipeline	90%	`src/AdvisoryAI/`
Guardrail Pipeline	100%	`AdvisoryAI/Guardrails/`
Evidence Retrieval	80%	SBOM context, vector/structured retrieval
TrustLatticeEngine	100%	`Policy/TrustLattice/`
SourceTrustScoreCalculator	100%	`VexLens/Trust/`
Remediation Hints	30%	`Policy.Unknowns/Services/`
ProofChain/Attestations	100%	`Attestor/ProofChain/`
DeltaVerdict	100%	`StellaOps.DeltaVerdict/`
Offline/Airgap	80%	Various modules

Gaps Requiring New Development

LLM-generated explanations - Feature 1
Remedy Autopilot with PRs - Feature 3
Policy NL→Rules - Feature 6
AI artifact attestation types - All features
Sovereign/offline LLM - Phase 3

Created Sprints

Sprint	Topic	Tasks
SPRINT_20251226_015_AI_zastava_companion	Explanation generation	21 tasks
SPRINT_20251226_016_AI_remedy_autopilot	Automated remediation PRs	26 tasks
SPRINT_20251226_017_AI_policy_copilot	NL→lattice rules	26 tasks
SPRINT_20251226_018_AI_attestations	AI artifact attestation types	23 tasks
SPRINT_20251226_019_AI_offline_inference	Sovereign/offline AI	26 tasks

Total: 5 sprints, 122 tasks

Archived Advisory

"Weighted Confidence for VEX Sources" → moved to archived/2025-12-26-vex-scoring/ (Substantially implemented in VexLens SourceTrustScoreCalculator)

6.7 KiB Raw Blame History