feat: Add archived advisories and implement smart-diff as a core evidence primitive

- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints. - Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability. - Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs. - Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making. - Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps. - Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
2025-12-26 13:01:43 +02:00
parent 22390057fc
commit 7792749bb4
50 changed files with 6844 additions and 130 deletions
--- a/docs/modules/signer/guides/keyless-signing.md
+++ b/docs/modules/signer/guides/keyless-signing.md
@@ -0,0 +1,273 @@
+# Keyless Signing Guide
+
+## Overview
+
+Keyless signing uses ephemeral X.509 certificates from Sigstore Fulcio, eliminating the need for persistent signing keys. This approach is ideal for CI/CD pipelines where key management is complex and error-prone.
+
+### How It Works
+
+```
+┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
+│ CI Pipeline  │────▶│ OIDC Provider│────▶│ Fulcio       │────▶│ Rekor        │
+│              │     │ (GitHub/GL)  │     │ (Sigstore)   │     │ (Sigstore)   │
+│ 1. Get token │     │ 2. Issue JWT │     │ 3. Issue cert│     │ 4. Log entry │
+│              │     │    (5 min)   │     │    (10 min)  │     │ (permanent)  │
+└──────────────┘     └──────────────┘     └──────────────┘     └──────────────┘
+        │                                                              │
+        │                                                              │
+        └───────────── Attestation with cert + Rekor proof ───────────┘
+```
+
+1. **OIDC Token**: Pipeline requests identity token from CI platform
+2. **Fulcio Certificate**: Token exchanged for short-lived signing certificate (~10 min)
+3. **Ephemeral Key**: Private key exists only in memory during signing
+4. **Rekor Logging**: Signature logged to transparency log for verification after cert expiry
+
+### Key Benefits
+
+| Benefit | Description |
+|---------|-------------|
+| **Zero Key Management** | No secrets to rotate, store, or protect |
+| **Identity Binding** | Signatures tied to OIDC identity (repo, branch, workflow) |
+| **Audit Trail** | All signatures logged to Rekor transparency log |
+| **Short-lived Certs** | Minimizes exposure window (~10 minutes) |
+| **Industry Standard** | Adopted by Kubernetes, npm, PyPI, and major ecosystems |
+
+## Quick Start
+
+### Prerequisites
+
+1. StellaOps CLI installed
+2. CI platform with OIDC support (GitHub Actions, GitLab CI, Gitea)
+3. Network access to Fulcio and Rekor (or private instances)
+
+### GitHub Actions Example
+
+```yaml
+name: Sign Container Image
+
+on:
+  push:
+    branches: [main]
+
+jobs:
+  build-and-sign:
+    runs-on: ubuntu-latest
+    permissions:
+      id-token: write  # Required for OIDC
+      contents: read
+      packages: write
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Build and Push Image
+        id: build
+        run: |
+          docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
+          docker push ghcr.io/${{ github.repository }}:${{ github.sha }}
+          echo "digest=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.sha }} | cut -d@ -f2)" >> $GITHUB_OUTPUT
+
+      - name: Keyless Sign
+        uses: stella-ops/sign-action@v1
+        with:
+          artifact-digest: ${{ steps.build.outputs.digest }}
+          artifact-type: image
+```
+
+### CLI Usage
+
+```bash
+# Sign with ambient OIDC token (in CI environment)
+stella attest sign --keyless --artifact sha256:abc123...
+
+# Sign with explicit token
+STELLAOPS_OIDC_TOKEN="..." stella attest sign --keyless --artifact sha256:abc123...
+
+# Verify signature (checks Rekor proof)
+stella attest verify \
+  --artifact sha256:abc123... \
+  --certificate-identity "repo:myorg/myrepo:ref:refs/heads/main" \
+  --certificate-oidc-issuer "https://token.actions.githubusercontent.com"
+```
+
+## Configuration
+
+### Signer Configuration
+
+```yaml
+# etc/signer.yaml
+signer:
+  signing:
+    mode: "keyless"
+    keyless:
+      enabled: true
+      fulcio:
+        url: "https://fulcio.sigstore.dev"
+        timeout: 30s
+        retries: 3
+      oidc:
+        issuer: "https://authority.internal"
+        clientId: "signer-keyless"
+        useAmbientToken: true
+      algorithms:
+        preferred: "ECDSA_P256"
+        allowed: ["ECDSA_P256", "Ed25519"]
+      certificate:
+        rootBundlePath: "/etc/stellaops/fulcio-roots.pem"
+        validateChain: true
+        requireSCT: true
+```
+
+### Private Fulcio Instance
+
+For air-gapped or high-security environments, deploy a private Fulcio instance:
+
+```yaml
+signer:
+  signing:
+    keyless:
+      fulcio:
+        url: "https://fulcio.internal.example.com"
+      oidc:
+        issuer: "https://keycloak.internal.example.com/realms/stellaops"
+      certificate:
+        rootBundlePath: "/etc/stellaops/private-fulcio-roots.pem"
+```
+
+## Identity Verification
+
+### Identity Constraints
+
+When verifying signatures, specify which identities are trusted:
+
+```bash
+stella attest verify \
+  --artifact sha256:abc123... \
+  --certificate-identity "repo:myorg/myrepo:ref:refs/heads/main" \
+  --certificate-oidc-issuer "https://token.actions.githubusercontent.com"
+```
+
+### Platform Identity Patterns
+
+#### GitHub Actions
+
+| Pattern | Matches |
+|---------|---------|
+| `repo:org/repo:.*` | Any ref in repository |
+| `repo:org/repo:ref:refs/heads/main` | Main branch only |
+| `repo:org/repo:ref:refs/tags/v.*` | Version tags |
+| `repo:org/repo:environment:production` | Production environment |
+
+**Issuer:** `https://token.actions.githubusercontent.com`
+
+#### GitLab CI
+
+| Pattern | Matches |
+|---------|---------|
+| `project_path:group/project:.*` | Any ref in project |
+| `project_path:group/project:ref_type:branch:ref:main` | Main branch |
+| `project_path:group/project:ref_protected:true` | Protected refs only |
+
+**Issuer:** `https://gitlab.com` (or self-hosted URL)
+
+## Long-Term Verification
+
+### The Problem
+
+Fulcio certificates expire in ~10 minutes. How do you verify signatures months later?
+
+### The Solution: Rekor Proofs
+
+```
+At signing time:
+┌──────────────────────────────────────────────────────────────┐
+│ Signature + Certificate + Signed-Certificate-Timestamp (SCT) │
+│                            ↓                                  │
+│                 Logged to Rekor                               │
+│                            ↓                                  │
+│         Merkle Inclusion Proof returned                       │
+└──────────────────────────────────────────────────────────────┘
+
+At verification time (even years later):
+┌──────────────────────────────────────────────────────────────┐
+│ 1. Check signature is valid (using cert public key)          │
+│ 2. Check SCT proves cert was logged when valid               │
+│ 3. Check Rekor inclusion proof (entry was logged)            │
+│ 4. Check signing time was within cert validity window        │
+│                            ↓                                  │
+│              Signature is valid! ✓                            │
+└──────────────────────────────────────────────────────────────┘
+```
+
+### Attestation Bundles
+
+For air-gapped verification, StellaOps bundles attestations with proofs:
+
+```bash
+# Export bundle with Rekor proofs
+stella attest export-bundle \
+  --image sha256:abc123... \
+  --include-proofs \
+  --output attestation-bundle.json
+
+# Verify offline
+stella attest verify --offline \
+  --bundle attestation-bundle.json \
+  --artifact sha256:abc123...
+```
+
+## Troubleshooting
+
+### Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `OIDC token expired` | Token older than 5 minutes | Re-acquire token before signing |
+| `Fulcio unavailable` | Network issues | Check connectivity, increase timeout |
+| `Certificate chain invalid` | Wrong Fulcio roots | Update root bundle |
+| `Identity mismatch` | Wrong verify constraints | Check issuer and identity patterns |
+| `Rekor proof missing` | Logging failed | Retry signing, check Rekor status |
+
+### Debug Mode
+
+```bash
+# Enable verbose logging
+STELLAOPS_LOG_LEVEL=debug stella attest sign --keyless --artifact sha256:...
+
+# Inspect certificate details
+stella attest inspect --artifact sha256:... --show-cert
+```
+
+## Security Considerations
+
+### Best Practices
+
+1. **Always verify identity**: Never accept `.*` as the full identity pattern
+2. **Require Rekor proofs**: Use `--require-rekor` for production verification
+3. **Pin OIDC issuers**: Only trust expected issuers
+4. **Use environment constraints**: More specific than branch names
+5. **Monitor signing activity**: Alert on unexpected identities
+
+### Threat Model
+
+| Threat | Mitigation |
+|--------|------------|
+| Stolen OIDC token | Short lifetime (~5 min), audience binding |
+| Fulcio compromise | Certificate Transparency (SCT), multiple roots |
+| Rekor compromise | Multiple witnesses, checkpoints, consistency proofs |
+| Private key theft | Ephemeral keys, never persisted |
+
+## Related Documentation
+
+- [Signer Architecture](../architecture.md)
+- [Attestor Bundle Format](../../attestor/bundle-format.md)
+- [Air-Gap Verification](../../../airgap/attestation-verification.md)
+- [CI/CD Integration](../../../guides/cicd-signing.md)
+
+## External Resources
+
+- [Sigstore Documentation](https://docs.sigstore.dev/)
+- [Fulcio Overview](https://docs.sigstore.dev/certificate_authority/overview/)
+- [Rekor Transparency Log](https://docs.sigstore.dev/logging/overview/)
+- [cosign Keyless Signing](https://docs.sigstore.dev/signing/quickstart/)
--- a/docs/modules/web/unified-triage-specification.md
+++ b/docs/modules/web/unified-triage-specification.md
@@ -0,0 +1,348 @@
+# Unified Triage Experience Specification
+
+**Version:** 1.0
+**Status:** Active
+**Last Updated:** 2025-12-26
+**Consolidated From:** 3 product advisories (see References)
+
+## 1. Executive Summary
+
+### The Problem
+Modern container security generates overwhelming vulnerability data. Competitors offer fragmented solutions: Snyk provides reachability analysis, Anchore offers VEX annotations, Prisma Cloud delivers runtime signals. Security teams must context-switch between tools, losing precious time and context.
+
+### The Stella Ops Solution
+A **unified triage canvas** that combines:
+- Rich evidence visualization with proof-carrying verdicts
+- VEX decisioning as first-class policy objects
+- AI-assisted analysis via AdvisoryAI
+- Attestable exceptions with audit trails
+- Offline-first architecture for air-gapped parity
+
+## 2. Competitive Landscape
+
+### Snyk — Reachability + Continuous Context
+- Implements reachability analysis building call graphs
+- Factors reachability into priority scores
+- Uses static analysis + AI + expert curation
+- Tracks issues over time without re-scanning unchanged images
+
+### Anchore — Vulnerability Annotations + VEX Export
+- Vulnerability annotation workflows via UI/API
+- Labels: "not applicable", "mitigated", "under investigation"
+- Export as OpenVEX and CycloneDX VEX
+- Downstream consumers receive curated exploitability state
+
+### Prisma Cloud — Runtime Defense
+- Continuous behavioral profiling
+- Process, file, and network rule enforcement
+- Learning models baseline expected behavior
+- Runtime context during operational incidents
+
+### Stella Ops Differentiation
+| Feature | Snyk | Anchore | Prisma | Stella Ops |
+|---------|------|---------|--------|------------|
+| Reachability analysis | Yes | Partial | No | Yes (static + binary + runtime) |
+| VEX as policy objects | No | Export only | No | **First-class** |
+| Attestable exceptions | No | No | No | **Yes (DSSE)** |
+| Offline replay | No | No | No | **Yes** |
+| AI-assisted triage | Yes | No | No | Yes (AdvisoryAI) |
+| Evidence graphs | Partial | No | Partial | **Full chain** |
+
+## 3. Core UI Concepts
+
+### 3.1 Visual Diff Pattern
+Every policy decision or reachability change is treated as a **visual diff**, enabling quick, explainable triage.
+
+#### Side-by-Side Panes
+- **Before** (previous scan/policy) vs **After** (current)
+- Show dependency/reachability subgraph
+- Highlight added/removed/changed nodes/edges
+
+#### Evidence Strip (Right Rail)
+Human-readable facts used by the engine:
+- Feature flag status (e.g., "feature flag OFF")
+- Code path analysis (e.g., "code path unreachable")
+- Runtime traces (e.g., "kernel eBPF trace absent")
+
+#### Diff Verdict Header
+```
+Risk ↓ from Medium → Low (policy v1.8 → v1.9)
+```
+
+#### Filter Chips
+Scope by: component, package, CVE, policy rule, environment
+
+### 3.2 Data Models
+
+```typescript
+interface GraphSnapshot {
+  nodes: GraphNode[];
+  edges: GraphEdge[];
+  metadata: { component: string; version: string; tags: string[] };
+}
+
+interface PolicySnapshot {
+  version: string;
+  rulesHash: string;
+  inputs: { flags: Record<string, boolean>; env: string; vexSources: string[] };
+}
+
+interface Delta {
+  added: DeltaItem[];
+  removed: DeltaItem[];
+  changed: DeltaItem[];
+  ruleOutcomes: RuleOutcomeDelta[];
+}
+
+interface EvidenceItem {
+  type: 'trace_hit' | 'sbom_line' | 'vex_claim' | 'config_value';
+  source: string;
+  digest: string;
+  excerpt: string;
+  timestamp: string;
+}
+
+interface SignedDeltaVerdict {
+  status: 'routine' | 'review' | 'block';
+  signatures: Signature[];
+  producer: string;
+}
+```
+
+### 3.3 Micro-Interactions
+
+| Interaction | Behavior |
+|-------------|----------|
+| Hover changed node | Inline badge explaining "why it changed" |
+| Click rule in rail | Spotlight the exact subgraph affected |
+| Toggle "explain like I'm new" | Expands jargon into plain language |
+| One-click "copy audit bundle" | Exports delta + evidence as attachment |
+
+### 3.4 Keyboard Shortcuts
+
+| Key | Action |
+|-----|--------|
+| `1` | Focus changes only |
+| `2` | Show full graph |
+| `E` | Expand evidence |
+| `A` | Export audit bundle |
+| `N` | Next item in queue |
+| `P` | Previous item |
+| `M` | Mark not affected |
+
+## 4. Risk Budget Visualization
+
+### 4.1 Concept
+- **Risk budget** = allowable unresolved risk for a release (e.g., 100 "risk points")
+- **Burn** = consumption rate as alerts appear, minus "payback" from fixes
+
+### 4.2 Dashboard Components
+
+#### Heatmap of Unknowns
+| Component | Vulns | Compliance | Perf | Data | Supply Chain |
+|-----------|-------|------------|------|------|--------------|
+| Service A | 🟡 12 | 🟢 0 | 🟡 3 | 🔴 8 | 🟡 5 |
+| Service B | 🔴 24 | 🟡 2 | 🟢 1 | 🟡 4 | 🟢 0 |
+
+Cell value = unknowns count × severity weight
+
+#### Delta Table (Risk Decay per Release)
+| Release | Before | After | Retired | Shifted | Unknowns |
+|---------|--------|-------|---------|---------|----------|
+| v2.3.1 | 85 | 67 | -22 | +4 | 12 |
+| v2.3.0 | 92 | 85 | -15 | +8 | 18 |
+
+#### Exception Ledger
+Every accepted risk has: ID, owner, expiry, evidence note, auto-reminder.
+
+### 4.3 Risk Budget Burn-Up Chart
+
+```
+Risk Points
+    ^
+100 |__________ Budget Line (flat or stepped)
+    |         \
+ 80 |          \  ← Actual Risk (cumulative)
+    |           \
+ 60 |            \_____ Headroom (green)
+    |                 \
+ 40 |                  \__ Target by release
+    |
+    +---------------------------------> Time
+         T-30    T-14   T-7   T-2   Release
+```
+
+- **X-axis:** Calendar dates to code freeze
+- **Y-axis:** Risk points
+- **Two lines:** Budget (flat/stepped) + Actual Risk (daily)
+- **Shaded area:** Headroom (green) or Overrun (red)
+- **Markers:** Feature freeze, pen-test, dependency bumps
+
+### 4.4 Computation Formulas
+
+```typescript
+// Risk points per issue
+risk_points = severity_weight × exposure_factor × evidence_freshness_penalty
+
+// Unknown penalty (no evidence ≤ N days)
+if (days_since_evidence > threshold) {
+  risk_points *= 1.5; // multiplier
+}
+
+// Decay on fix
+if (fix_landed && evidence_refreshed) {
+  subtract_points(issue.risk_points);
+}
+
+// Guardrails
+if (unknowns > K || actual_risk > budget) {
+  fail_gate();
+}
+```
+
+## 5. Implementation Components
+
+### 5.1 Component Hierarchy
+
+```
+TriageCanvasComponent
+├── TriageListComponent
+│   ├── SeverityFilterComponent
+│   ├── VulnerabilityRowComponent
+│   └── BulkActionBarComponent
+├── TriageDetailComponent
+│   ├── AffectedPackagesPanel
+│   ├── AdvisoryRefsPanel
+│   ├── ReachabilityContextComponent
+│   └── EvidenceProvenanceComponent
+├── AiRecommendationPanel
+│   ├── ReachabilityExplanation
+│   ├── SuggestedJustification
+│   └── SimilarVulnsComponent
+├── VexDecisionModalComponent
+│   ├── StatusSelector
+│   ├── JustificationTypeSelector
+│   ├── EvidenceRefInput
+│   └── ScopeSelector
+└── VexHistoryComponent
+
+CompareViewComponent
+├── BaselineSelectorComponent
+├── TrustIndicatorsComponent
+├── DeltaSummaryStripComponent
+├── ThreePaneLayoutComponent
+│   ├── CategoriesPaneComponent
+│   ├── ItemsPaneComponent
+│   └── ProofPaneComponent
+├── ActionablesPanelComponent
+└── ExportActionsComponent
+
+RiskDashboardComponent
+├── BurnUpChartComponent
+├── UnknownsHeatmapComponent
+├── DeltaTableComponent
+├── ExceptionLedgerComponent
+└── KpiTilesComponent
+```
+
+### 5.2 Service Layer
+
+```typescript
+// Core services
+TriageService           // Vulnerability list + filtering
+VexDecisionService      // CRUD for VEX decisions
+AdvisoryAiService       // AI recommendations
+CompareService          // Baseline + delta computation
+RiskBudgetService       // Budget + burn tracking
+EvidenceService         // Evidence retrieval
+```
+
+## 6. API Integration
+
+### VulnExplorer Endpoints
+```
+GET  /api/v1/vulnerabilities                    // List with filters
+GET  /api/v1/vulnerabilities/{id}               // Detail
+GET  /api/v1/vulnerabilities/{id}/reachability  // Call graph slice
+POST /api/v1/vex-decisions                      // Create VEX decision
+PUT  /api/v1/vex-decisions/{id}                 // Update VEX decision
+GET  /api/v1/vex-decisions?vulnId={id}          // History for vuln
+```
+
+### AdvisoryAI Endpoints
+```
+POST /api/v1/advisory/plan                      // Get analysis plan
+POST /api/v1/advisory/execute                   // Execute analysis
+GET  /api/v1/advisory/output/{taskId}           // Get recommendations
+```
+
+### Delta/Compare Endpoints
+```
+GET  /api/v1/baselines/recommendations/{digest}
+POST /api/v1/delta/compute
+GET  /api/v1/delta/{id}/trust-indicators
+GET  /api/v1/actionables/delta/{id}
+```
+
+## 7. Implementation Status
+
+| Component | Sprint | Status |
+|-----------|--------|--------|
+| Risk Dashboard Base | SPRINT_20251226_004_FE | TODO |
+| Smart-Diff Compare View | SPRINT_20251226_012_FE | TODO |
+| Unified Triage Canvas | SPRINT_20251226_013_FE | TODO |
+| Documentation Consolidation | SPRINT_20251226_014_DOCS | TODO |
+| VEX Decision Models | VulnExplorer/Models | **COMPLETE** |
+| AdvisoryAI Pipeline | src/AdvisoryAI | **COMPLETE** |
+| Confidence Badge | Web/shared/components | **COMPLETE** |
+| Release Flow | Web/features/releases | **COMPLETE** |
+
+## 8. Testing Strategy
+
+### Unit Tests
+- Component behavior (selection, filtering, expansion)
+- Signal/computed derivations
+- Role-based view switching
+- Form validation (VEX decisions)
+
+### Integration Tests
+- API service calls and response handling
+- Navigation and routing
+- State persistence across route changes
+
+### E2E Tests
+- Full triage workflow: list → detail → VEX decision
+- Comparison workflow: select baseline → compute delta → export
+- Risk budget: view charts → create exception → see update
+
+### Accessibility Tests
+- Keyboard navigation completeness
+- Screen reader announcements
+- Color contrast compliance
+
+## 9. Success Metrics
+
+| Metric | Definition | Target |
+|--------|------------|--------|
+| Mean Time to Triage (MTTT) | Time from vuln notification to VEX decision | < 5 min |
+| Mean Time to Explain (MTTE) | Time from "why did this change?" to "Understood" click | < 2 min |
+| Triage Queue Throughput | Vulns triaged per hour per analyst | > 20 |
+| AI Recommendation Acceptance | % of AI suggestions accepted without modification | > 60% |
+
+## 10. References
+
+### Archived Advisories (Consolidated Here)
+- `archived/2025-12-26-triage-advisories/25-Dec-2025 - Triage UI Lessons from Competitors.md`
+- `archived/2025-12-26-triage-advisories/25-Dec-2025 - Visual Diffs for Explainable Triage.md`
+- `archived/2025-12-26-triage-advisories/26-Dec-2026 - Visualizing the Risk Budget.md`
+
+### Related Documentation
+- `docs/modules/web/smart-diff-ui-architecture.md`
+- `docs/implplan/SPRINT_20251226_004_FE_risk_dashboard.md`
+- `docs/implplan/SPRINT_20251226_012_FE_smart_diff_compare.md`
+- `docs/implplan/SPRINT_20251226_013_FE_triage_canvas.md`
+
+### External References
+- [Snyk Reachability Analysis](https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/reachability-analysis)
+- [Anchore Vulnerability Annotations](https://docs.anchore.com/current/docs/vulnerability_management/vuln_annotations/)
+- [Prisma Cloud Runtime Defense](https://docs.prismacloud.io/en/compute-edition/30/admin-guide/runtime-defense/)