Files
git.stella-ops.org/docs/modules/web/unified-triage-specification.md
StellaOps Bot 7792749bb4 feat: Add archived advisories and implement smart-diff as a core evidence primitive
- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints.
- Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability.
- Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs.
- Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making.
- Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps.
- Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
2025-12-26 13:01:43 +02:00

349 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Unified Triage Experience Specification
**Version:** 1.0
**Status:** Active
**Last Updated:** 2025-12-26
**Consolidated From:** 3 product advisories (see References)
## 1. Executive Summary
### The Problem
Modern container security generates overwhelming vulnerability data. Competitors offer fragmented solutions: Snyk provides reachability analysis, Anchore offers VEX annotations, Prisma Cloud delivers runtime signals. Security teams must context-switch between tools, losing precious time and context.
### The Stella Ops Solution
A **unified triage canvas** that combines:
- Rich evidence visualization with proof-carrying verdicts
- VEX decisioning as first-class policy objects
- AI-assisted analysis via AdvisoryAI
- Attestable exceptions with audit trails
- Offline-first architecture for air-gapped parity
## 2. Competitive Landscape
### Snyk — Reachability + Continuous Context
- Implements reachability analysis building call graphs
- Factors reachability into priority scores
- Uses static analysis + AI + expert curation
- Tracks issues over time without re-scanning unchanged images
### Anchore — Vulnerability Annotations + VEX Export
- Vulnerability annotation workflows via UI/API
- Labels: "not applicable", "mitigated", "under investigation"
- Export as OpenVEX and CycloneDX VEX
- Downstream consumers receive curated exploitability state
### Prisma Cloud — Runtime Defense
- Continuous behavioral profiling
- Process, file, and network rule enforcement
- Learning models baseline expected behavior
- Runtime context during operational incidents
### Stella Ops Differentiation
| Feature | Snyk | Anchore | Prisma | Stella Ops |
|---------|------|---------|--------|------------|
| Reachability analysis | Yes | Partial | No | Yes (static + binary + runtime) |
| VEX as policy objects | No | Export only | No | **First-class** |
| Attestable exceptions | No | No | No | **Yes (DSSE)** |
| Offline replay | No | No | No | **Yes** |
| AI-assisted triage | Yes | No | No | Yes (AdvisoryAI) |
| Evidence graphs | Partial | No | Partial | **Full chain** |
## 3. Core UI Concepts
### 3.1 Visual Diff Pattern
Every policy decision or reachability change is treated as a **visual diff**, enabling quick, explainable triage.
#### Side-by-Side Panes
- **Before** (previous scan/policy) vs **After** (current)
- Show dependency/reachability subgraph
- Highlight added/removed/changed nodes/edges
#### Evidence Strip (Right Rail)
Human-readable facts used by the engine:
- Feature flag status (e.g., "feature flag OFF")
- Code path analysis (e.g., "code path unreachable")
- Runtime traces (e.g., "kernel eBPF trace absent")
#### Diff Verdict Header
```
Risk ↓ from Medium → Low (policy v1.8 → v1.9)
```
#### Filter Chips
Scope by: component, package, CVE, policy rule, environment
### 3.2 Data Models
```typescript
interface GraphSnapshot {
nodes: GraphNode[];
edges: GraphEdge[];
metadata: { component: string; version: string; tags: string[] };
}
interface PolicySnapshot {
version: string;
rulesHash: string;
inputs: { flags: Record<string, boolean>; env: string; vexSources: string[] };
}
interface Delta {
added: DeltaItem[];
removed: DeltaItem[];
changed: DeltaItem[];
ruleOutcomes: RuleOutcomeDelta[];
}
interface EvidenceItem {
type: 'trace_hit' | 'sbom_line' | 'vex_claim' | 'config_value';
source: string;
digest: string;
excerpt: string;
timestamp: string;
}
interface SignedDeltaVerdict {
status: 'routine' | 'review' | 'block';
signatures: Signature[];
producer: string;
}
```
### 3.3 Micro-Interactions
| Interaction | Behavior |
|-------------|----------|
| Hover changed node | Inline badge explaining "why it changed" |
| Click rule in rail | Spotlight the exact subgraph affected |
| Toggle "explain like I'm new" | Expands jargon into plain language |
| One-click "copy audit bundle" | Exports delta + evidence as attachment |
### 3.4 Keyboard Shortcuts
| Key | Action |
|-----|--------|
| `1` | Focus changes only |
| `2` | Show full graph |
| `E` | Expand evidence |
| `A` | Export audit bundle |
| `N` | Next item in queue |
| `P` | Previous item |
| `M` | Mark not affected |
## 4. Risk Budget Visualization
### 4.1 Concept
- **Risk budget** = allowable unresolved risk for a release (e.g., 100 "risk points")
- **Burn** = consumption rate as alerts appear, minus "payback" from fixes
### 4.2 Dashboard Components
#### Heatmap of Unknowns
| Component | Vulns | Compliance | Perf | Data | Supply Chain |
|-----------|-------|------------|------|------|--------------|
| Service A | 🟡 12 | 🟢 0 | 🟡 3 | 🔴 8 | 🟡 5 |
| Service B | 🔴 24 | 🟡 2 | 🟢 1 | 🟡 4 | 🟢 0 |
Cell value = unknowns count × severity weight
#### Delta Table (Risk Decay per Release)
| Release | Before | After | Retired | Shifted | Unknowns |
|---------|--------|-------|---------|---------|----------|
| v2.3.1 | 85 | 67 | -22 | +4 | 12 |
| v2.3.0 | 92 | 85 | -15 | +8 | 18 |
#### Exception Ledger
Every accepted risk has: ID, owner, expiry, evidence note, auto-reminder.
### 4.3 Risk Budget Burn-Up Chart
```
Risk Points
^
100 |__________ Budget Line (flat or stepped)
| \
80 | \ ← Actual Risk (cumulative)
| \
60 | \_____ Headroom (green)
| \
40 | \__ Target by release
|
+---------------------------------> Time
T-30 T-14 T-7 T-2 Release
```
- **X-axis:** Calendar dates to code freeze
- **Y-axis:** Risk points
- **Two lines:** Budget (flat/stepped) + Actual Risk (daily)
- **Shaded area:** Headroom (green) or Overrun (red)
- **Markers:** Feature freeze, pen-test, dependency bumps
### 4.4 Computation Formulas
```typescript
// Risk points per issue
risk_points = severity_weight × exposure_factor × evidence_freshness_penalty
// Unknown penalty (no evidence ≤ N days)
if (days_since_evidence > threshold) {
risk_points *= 1.5; // multiplier
}
// Decay on fix
if (fix_landed && evidence_refreshed) {
subtract_points(issue.risk_points);
}
// Guardrails
if (unknowns > K || actual_risk > budget) {
fail_gate();
}
```
## 5. Implementation Components
### 5.1 Component Hierarchy
```
TriageCanvasComponent
├── TriageListComponent
│ ├── SeverityFilterComponent
│ ├── VulnerabilityRowComponent
│ └── BulkActionBarComponent
├── TriageDetailComponent
│ ├── AffectedPackagesPanel
│ ├── AdvisoryRefsPanel
│ ├── ReachabilityContextComponent
│ └── EvidenceProvenanceComponent
├── AiRecommendationPanel
│ ├── ReachabilityExplanation
│ ├── SuggestedJustification
│ └── SimilarVulnsComponent
├── VexDecisionModalComponent
│ ├── StatusSelector
│ ├── JustificationTypeSelector
│ ├── EvidenceRefInput
│ └── ScopeSelector
└── VexHistoryComponent
CompareViewComponent
├── BaselineSelectorComponent
├── TrustIndicatorsComponent
├── DeltaSummaryStripComponent
├── ThreePaneLayoutComponent
│ ├── CategoriesPaneComponent
│ ├── ItemsPaneComponent
│ └── ProofPaneComponent
├── ActionablesPanelComponent
└── ExportActionsComponent
RiskDashboardComponent
├── BurnUpChartComponent
├── UnknownsHeatmapComponent
├── DeltaTableComponent
├── ExceptionLedgerComponent
└── KpiTilesComponent
```
### 5.2 Service Layer
```typescript
// Core services
TriageService // Vulnerability list + filtering
VexDecisionService // CRUD for VEX decisions
AdvisoryAiService // AI recommendations
CompareService // Baseline + delta computation
RiskBudgetService // Budget + burn tracking
EvidenceService // Evidence retrieval
```
## 6. API Integration
### VulnExplorer Endpoints
```
GET /api/v1/vulnerabilities // List with filters
GET /api/v1/vulnerabilities/{id} // Detail
GET /api/v1/vulnerabilities/{id}/reachability // Call graph slice
POST /api/v1/vex-decisions // Create VEX decision
PUT /api/v1/vex-decisions/{id} // Update VEX decision
GET /api/v1/vex-decisions?vulnId={id} // History for vuln
```
### AdvisoryAI Endpoints
```
POST /api/v1/advisory/plan // Get analysis plan
POST /api/v1/advisory/execute // Execute analysis
GET /api/v1/advisory/output/{taskId} // Get recommendations
```
### Delta/Compare Endpoints
```
GET /api/v1/baselines/recommendations/{digest}
POST /api/v1/delta/compute
GET /api/v1/delta/{id}/trust-indicators
GET /api/v1/actionables/delta/{id}
```
## 7. Implementation Status
| Component | Sprint | Status |
|-----------|--------|--------|
| Risk Dashboard Base | SPRINT_20251226_004_FE | TODO |
| Smart-Diff Compare View | SPRINT_20251226_012_FE | TODO |
| Unified Triage Canvas | SPRINT_20251226_013_FE | TODO |
| Documentation Consolidation | SPRINT_20251226_014_DOCS | TODO |
| VEX Decision Models | VulnExplorer/Models | **COMPLETE** |
| AdvisoryAI Pipeline | src/AdvisoryAI | **COMPLETE** |
| Confidence Badge | Web/shared/components | **COMPLETE** |
| Release Flow | Web/features/releases | **COMPLETE** |
## 8. Testing Strategy
### Unit Tests
- Component behavior (selection, filtering, expansion)
- Signal/computed derivations
- Role-based view switching
- Form validation (VEX decisions)
### Integration Tests
- API service calls and response handling
- Navigation and routing
- State persistence across route changes
### E2E Tests
- Full triage workflow: list → detail → VEX decision
- Comparison workflow: select baseline → compute delta → export
- Risk budget: view charts → create exception → see update
### Accessibility Tests
- Keyboard navigation completeness
- Screen reader announcements
- Color contrast compliance
## 9. Success Metrics
| Metric | Definition | Target |
|--------|------------|--------|
| Mean Time to Triage (MTTT) | Time from vuln notification to VEX decision | < 5 min |
| Mean Time to Explain (MTTE) | Time from "why did this change?" to "Understood" click | < 2 min |
| Triage Queue Throughput | Vulns triaged per hour per analyst | > 20 |
| AI Recommendation Acceptance | % of AI suggestions accepted without modification | > 60% |
## 10. References
### Archived Advisories (Consolidated Here)
- `archived/2025-12-26-triage-advisories/25-Dec-2025 - Triage UI Lessons from Competitors.md`
- `archived/2025-12-26-triage-advisories/25-Dec-2025 - Visual Diffs for Explainable Triage.md`
- `archived/2025-12-26-triage-advisories/26-Dec-2026 - Visualizing the Risk Budget.md`
### Related Documentation
- `docs/modules/web/smart-diff-ui-architecture.md`
- `docs/implplan/SPRINT_20251226_004_FE_risk_dashboard.md`
- `docs/implplan/SPRINT_20251226_012_FE_smart_diff_compare.md`
- `docs/implplan/SPRINT_20251226_013_FE_triage_canvas.md`
### External References
- [Snyk Reachability Analysis](https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/reachability-analysis)
- [Anchore Vulnerability Annotations](https://docs.anchore.com/current/docs/vulnerability_management/vuln_annotations/)
- [Prisma Cloud Runtime Defense](https://docs.prismacloud.io/en/compute-edition/30/admin-guide/runtime-defense/)