Files
git.stella-ops.org/docs/modules/web/unified-triage-specification.md
StellaOps Bot 7792749bb4 feat: Add archived advisories and implement smart-diff as a core evidence primitive
- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints.
- Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability.
- Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs.
- Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making.
- Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps.
- Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
2025-12-26 13:01:43 +02:00

11 KiB
Raw Blame History

Unified Triage Experience Specification

Version: 1.0 Status: Active Last Updated: 2025-12-26 Consolidated From: 3 product advisories (see References)

1. Executive Summary

The Problem

Modern container security generates overwhelming vulnerability data. Competitors offer fragmented solutions: Snyk provides reachability analysis, Anchore offers VEX annotations, Prisma Cloud delivers runtime signals. Security teams must context-switch between tools, losing precious time and context.

The Stella Ops Solution

A unified triage canvas that combines:

  • Rich evidence visualization with proof-carrying verdicts
  • VEX decisioning as first-class policy objects
  • AI-assisted analysis via AdvisoryAI
  • Attestable exceptions with audit trails
  • Offline-first architecture for air-gapped parity

2. Competitive Landscape

Snyk — Reachability + Continuous Context

  • Implements reachability analysis building call graphs
  • Factors reachability into priority scores
  • Uses static analysis + AI + expert curation
  • Tracks issues over time without re-scanning unchanged images

Anchore — Vulnerability Annotations + VEX Export

  • Vulnerability annotation workflows via UI/API
  • Labels: "not applicable", "mitigated", "under investigation"
  • Export as OpenVEX and CycloneDX VEX
  • Downstream consumers receive curated exploitability state

Prisma Cloud — Runtime Defense

  • Continuous behavioral profiling
  • Process, file, and network rule enforcement
  • Learning models baseline expected behavior
  • Runtime context during operational incidents

Stella Ops Differentiation

Feature Snyk Anchore Prisma Stella Ops
Reachability analysis Yes Partial No Yes (static + binary + runtime)
VEX as policy objects No Export only No First-class
Attestable exceptions No No No Yes (DSSE)
Offline replay No No No Yes
AI-assisted triage Yes No No Yes (AdvisoryAI)
Evidence graphs Partial No Partial Full chain

3. Core UI Concepts

3.1 Visual Diff Pattern

Every policy decision or reachability change is treated as a visual diff, enabling quick, explainable triage.

Side-by-Side Panes

  • Before (previous scan/policy) vs After (current)
  • Show dependency/reachability subgraph
  • Highlight added/removed/changed nodes/edges

Evidence Strip (Right Rail)

Human-readable facts used by the engine:

  • Feature flag status (e.g., "feature flag OFF")
  • Code path analysis (e.g., "code path unreachable")
  • Runtime traces (e.g., "kernel eBPF trace absent")

Diff Verdict Header

Risk ↓ from Medium → Low (policy v1.8 → v1.9)

Filter Chips

Scope by: component, package, CVE, policy rule, environment

3.2 Data Models

interface GraphSnapshot {
  nodes: GraphNode[];
  edges: GraphEdge[];
  metadata: { component: string; version: string; tags: string[] };
}

interface PolicySnapshot {
  version: string;
  rulesHash: string;
  inputs: { flags: Record<string, boolean>; env: string; vexSources: string[] };
}

interface Delta {
  added: DeltaItem[];
  removed: DeltaItem[];
  changed: DeltaItem[];
  ruleOutcomes: RuleOutcomeDelta[];
}

interface EvidenceItem {
  type: 'trace_hit' | 'sbom_line' | 'vex_claim' | 'config_value';
  source: string;
  digest: string;
  excerpt: string;
  timestamp: string;
}

interface SignedDeltaVerdict {
  status: 'routine' | 'review' | 'block';
  signatures: Signature[];
  producer: string;
}

3.3 Micro-Interactions

Interaction Behavior
Hover changed node Inline badge explaining "why it changed"
Click rule in rail Spotlight the exact subgraph affected
Toggle "explain like I'm new" Expands jargon into plain language
One-click "copy audit bundle" Exports delta + evidence as attachment

3.4 Keyboard Shortcuts

Key Action
1 Focus changes only
2 Show full graph
E Expand evidence
A Export audit bundle
N Next item in queue
P Previous item
M Mark not affected

4. Risk Budget Visualization

4.1 Concept

  • Risk budget = allowable unresolved risk for a release (e.g., 100 "risk points")
  • Burn = consumption rate as alerts appear, minus "payback" from fixes

4.2 Dashboard Components

Heatmap of Unknowns

Component Vulns Compliance Perf Data Supply Chain
Service A 🟡 12 🟢 0 🟡 3 🔴 8 🟡 5
Service B 🔴 24 🟡 2 🟢 1 🟡 4 🟢 0

Cell value = unknowns count × severity weight

Delta Table (Risk Decay per Release)

Release Before After Retired Shifted Unknowns
v2.3.1 85 67 -22 +4 12
v2.3.0 92 85 -15 +8 18

Exception Ledger

Every accepted risk has: ID, owner, expiry, evidence note, auto-reminder.

4.3 Risk Budget Burn-Up Chart

Risk Points
    ^
100 |__________ Budget Line (flat or stepped)
    |         \
 80 |          \  ← Actual Risk (cumulative)
    |           \
 60 |            \_____ Headroom (green)
    |                 \
 40 |                  \__ Target by release
    |
    +---------------------------------> Time
         T-30    T-14   T-7   T-2   Release
  • X-axis: Calendar dates to code freeze
  • Y-axis: Risk points
  • Two lines: Budget (flat/stepped) + Actual Risk (daily)
  • Shaded area: Headroom (green) or Overrun (red)
  • Markers: Feature freeze, pen-test, dependency bumps

4.4 Computation Formulas

// Risk points per issue
risk_points = severity_weight × exposure_factor × evidence_freshness_penalty

// Unknown penalty (no evidence ≤ N days)
if (days_since_evidence > threshold) {
  risk_points *= 1.5; // multiplier
}

// Decay on fix
if (fix_landed && evidence_refreshed) {
  subtract_points(issue.risk_points);
}

// Guardrails
if (unknowns > K || actual_risk > budget) {
  fail_gate();
}

5. Implementation Components

5.1 Component Hierarchy

TriageCanvasComponent
├── TriageListComponent
│   ├── SeverityFilterComponent
│   ├── VulnerabilityRowComponent
│   └── BulkActionBarComponent
├── TriageDetailComponent
│   ├── AffectedPackagesPanel
│   ├── AdvisoryRefsPanel
│   ├── ReachabilityContextComponent
│   └── EvidenceProvenanceComponent
├── AiRecommendationPanel
│   ├── ReachabilityExplanation
│   ├── SuggestedJustification
│   └── SimilarVulnsComponent
├── VexDecisionModalComponent
│   ├── StatusSelector
│   ├── JustificationTypeSelector
│   ├── EvidenceRefInput
│   └── ScopeSelector
└── VexHistoryComponent

CompareViewComponent
├── BaselineSelectorComponent
├── TrustIndicatorsComponent
├── DeltaSummaryStripComponent
├── ThreePaneLayoutComponent
│   ├── CategoriesPaneComponent
│   ├── ItemsPaneComponent
│   └── ProofPaneComponent
├── ActionablesPanelComponent
└── ExportActionsComponent

RiskDashboardComponent
├── BurnUpChartComponent
├── UnknownsHeatmapComponent
├── DeltaTableComponent
├── ExceptionLedgerComponent
└── KpiTilesComponent

5.2 Service Layer

// Core services
TriageService           // Vulnerability list + filtering
VexDecisionService      // CRUD for VEX decisions
AdvisoryAiService       // AI recommendations
CompareService          // Baseline + delta computation
RiskBudgetService       // Budget + burn tracking
EvidenceService         // Evidence retrieval

6. API Integration

VulnExplorer Endpoints

GET  /api/v1/vulnerabilities                    // List with filters
GET  /api/v1/vulnerabilities/{id}               // Detail
GET  /api/v1/vulnerabilities/{id}/reachability  // Call graph slice
POST /api/v1/vex-decisions                      // Create VEX decision
PUT  /api/v1/vex-decisions/{id}                 // Update VEX decision
GET  /api/v1/vex-decisions?vulnId={id}          // History for vuln

AdvisoryAI Endpoints

POST /api/v1/advisory/plan                      // Get analysis plan
POST /api/v1/advisory/execute                   // Execute analysis
GET  /api/v1/advisory/output/{taskId}           // Get recommendations

Delta/Compare Endpoints

GET  /api/v1/baselines/recommendations/{digest}
POST /api/v1/delta/compute
GET  /api/v1/delta/{id}/trust-indicators
GET  /api/v1/actionables/delta/{id}

7. Implementation Status

Component Sprint Status
Risk Dashboard Base SPRINT_20251226_004_FE TODO
Smart-Diff Compare View SPRINT_20251226_012_FE TODO
Unified Triage Canvas SPRINT_20251226_013_FE TODO
Documentation Consolidation SPRINT_20251226_014_DOCS TODO
VEX Decision Models VulnExplorer/Models COMPLETE
AdvisoryAI Pipeline src/AdvisoryAI COMPLETE
Confidence Badge Web/shared/components COMPLETE
Release Flow Web/features/releases COMPLETE

8. Testing Strategy

Unit Tests

  • Component behavior (selection, filtering, expansion)
  • Signal/computed derivations
  • Role-based view switching
  • Form validation (VEX decisions)

Integration Tests

  • API service calls and response handling
  • Navigation and routing
  • State persistence across route changes

E2E Tests

  • Full triage workflow: list → detail → VEX decision
  • Comparison workflow: select baseline → compute delta → export
  • Risk budget: view charts → create exception → see update

Accessibility Tests

  • Keyboard navigation completeness
  • Screen reader announcements
  • Color contrast compliance

9. Success Metrics

Metric Definition Target
Mean Time to Triage (MTTT) Time from vuln notification to VEX decision < 5 min
Mean Time to Explain (MTTE) Time from "why did this change?" to "Understood" click < 2 min
Triage Queue Throughput Vulns triaged per hour per analyst > 20
AI Recommendation Acceptance % of AI suggestions accepted without modification > 60%

10. References

Archived Advisories (Consolidated Here)

  • archived/2025-12-26-triage-advisories/25-Dec-2025 - Triage UI Lessons from Competitors.md
  • archived/2025-12-26-triage-advisories/25-Dec-2025 - Visual Diffs for Explainable Triage.md
  • archived/2025-12-26-triage-advisories/26-Dec-2026 - Visualizing the Risk Budget.md
  • docs/modules/web/smart-diff-ui-architecture.md
  • docs/implplan/SPRINT_20251226_004_FE_risk_dashboard.md
  • docs/implplan/SPRINT_20251226_012_FE_smart_diff_compare.md
  • docs/implplan/SPRINT_20251226_013_FE_triage_canvas.md

External References