feat: Add archived advisories and implement smart-diff as a core evidence primitive

- Introduced new advisory documents for archived superseded advisories, including detailed descriptions of features already implemented or covered by existing sprints.
- Added "Smart-Diff as a Core Evidence Primitive" advisory outlining the treatment of SBOM diffs as first-class evidence objects, enhancing vulnerability verdicts with deterministic replayability.
- Created "Visual Diffs for Explainable Triage" advisory to improve user experience in understanding policy decisions and reachability changes through visual diffs.
- Implemented "Weighted Confidence for VEX Sources" advisory to rank conflicting vulnerability evidence based on freshness and confidence, facilitating better decision-making.
- Established a signer module charter detailing the mission, expectations, key components, and signing modes for cryptographic signing services in StellaOps.
- Consolidated overlapping concepts from triage UI, visual diffs, and risk budget visualization advisories into a unified specification for better clarity and implementation tracking.
This commit is contained in:
StellaOps Bot
2025-12-26 13:01:43 +02:00
parent 22390057fc
commit 7792749bb4
50 changed files with 6844 additions and 130 deletions

View File

@@ -0,0 +1,273 @@
# Keyless Signing Guide
## Overview
Keyless signing uses ephemeral X.509 certificates from Sigstore Fulcio, eliminating the need for persistent signing keys. This approach is ideal for CI/CD pipelines where key management is complex and error-prone.
### How It Works
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ CI Pipeline │────▶│ OIDC Provider│────▶│ Fulcio │────▶│ Rekor │
│ │ │ (GitHub/GL) │ │ (Sigstore) │ │ (Sigstore) │
│ 1. Get token │ │ 2. Issue JWT │ │ 3. Issue cert│ │ 4. Log entry │
│ │ │ (5 min) │ │ (10 min) │ │ (permanent) │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│ │
│ │
└───────────── Attestation with cert + Rekor proof ───────────┘
```
1. **OIDC Token**: Pipeline requests identity token from CI platform
2. **Fulcio Certificate**: Token exchanged for short-lived signing certificate (~10 min)
3. **Ephemeral Key**: Private key exists only in memory during signing
4. **Rekor Logging**: Signature logged to transparency log for verification after cert expiry
### Key Benefits
| Benefit | Description |
|---------|-------------|
| **Zero Key Management** | No secrets to rotate, store, or protect |
| **Identity Binding** | Signatures tied to OIDC identity (repo, branch, workflow) |
| **Audit Trail** | All signatures logged to Rekor transparency log |
| **Short-lived Certs** | Minimizes exposure window (~10 minutes) |
| **Industry Standard** | Adopted by Kubernetes, npm, PyPI, and major ecosystems |
## Quick Start
### Prerequisites
1. StellaOps CLI installed
2. CI platform with OIDC support (GitHub Actions, GitLab CI, Gitea)
3. Network access to Fulcio and Rekor (or private instances)
### GitHub Actions Example
```yaml
name: Sign Container Image
on:
push:
branches: [main]
jobs:
build-and-sign:
runs-on: ubuntu-latest
permissions:
id-token: write # Required for OIDC
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Build and Push Image
id: build
run: |
docker build -t ghcr.io/${{ github.repository }}:${{ github.sha }} .
docker push ghcr.io/${{ github.repository }}:${{ github.sha }}
echo "digest=$(docker inspect --format='{{index .RepoDigests 0}}' ghcr.io/${{ github.repository }}:${{ github.sha }} | cut -d@ -f2)" >> $GITHUB_OUTPUT
- name: Keyless Sign
uses: stella-ops/sign-action@v1
with:
artifact-digest: ${{ steps.build.outputs.digest }}
artifact-type: image
```
### CLI Usage
```bash
# Sign with ambient OIDC token (in CI environment)
stella attest sign --keyless --artifact sha256:abc123...
# Sign with explicit token
STELLAOPS_OIDC_TOKEN="..." stella attest sign --keyless --artifact sha256:abc123...
# Verify signature (checks Rekor proof)
stella attest verify \
--artifact sha256:abc123... \
--certificate-identity "repo:myorg/myrepo:ref:refs/heads/main" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com"
```
## Configuration
### Signer Configuration
```yaml
# etc/signer.yaml
signer:
signing:
mode: "keyless"
keyless:
enabled: true
fulcio:
url: "https://fulcio.sigstore.dev"
timeout: 30s
retries: 3
oidc:
issuer: "https://authority.internal"
clientId: "signer-keyless"
useAmbientToken: true
algorithms:
preferred: "ECDSA_P256"
allowed: ["ECDSA_P256", "Ed25519"]
certificate:
rootBundlePath: "/etc/stellaops/fulcio-roots.pem"
validateChain: true
requireSCT: true
```
### Private Fulcio Instance
For air-gapped or high-security environments, deploy a private Fulcio instance:
```yaml
signer:
signing:
keyless:
fulcio:
url: "https://fulcio.internal.example.com"
oidc:
issuer: "https://keycloak.internal.example.com/realms/stellaops"
certificate:
rootBundlePath: "/etc/stellaops/private-fulcio-roots.pem"
```
## Identity Verification
### Identity Constraints
When verifying signatures, specify which identities are trusted:
```bash
stella attest verify \
--artifact sha256:abc123... \
--certificate-identity "repo:myorg/myrepo:ref:refs/heads/main" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com"
```
### Platform Identity Patterns
#### GitHub Actions
| Pattern | Matches |
|---------|---------|
| `repo:org/repo:.*` | Any ref in repository |
| `repo:org/repo:ref:refs/heads/main` | Main branch only |
| `repo:org/repo:ref:refs/tags/v.*` | Version tags |
| `repo:org/repo:environment:production` | Production environment |
**Issuer:** `https://token.actions.githubusercontent.com`
#### GitLab CI
| Pattern | Matches |
|---------|---------|
| `project_path:group/project:.*` | Any ref in project |
| `project_path:group/project:ref_type:branch:ref:main` | Main branch |
| `project_path:group/project:ref_protected:true` | Protected refs only |
**Issuer:** `https://gitlab.com` (or self-hosted URL)
## Long-Term Verification
### The Problem
Fulcio certificates expire in ~10 minutes. How do you verify signatures months later?
### The Solution: Rekor Proofs
```
At signing time:
┌──────────────────────────────────────────────────────────────┐
│ Signature + Certificate + Signed-Certificate-Timestamp (SCT) │
│ ↓ │
│ Logged to Rekor │
│ ↓ │
│ Merkle Inclusion Proof returned │
└──────────────────────────────────────────────────────────────┘
At verification time (even years later):
┌──────────────────────────────────────────────────────────────┐
│ 1. Check signature is valid (using cert public key) │
│ 2. Check SCT proves cert was logged when valid │
│ 3. Check Rekor inclusion proof (entry was logged) │
│ 4. Check signing time was within cert validity window │
│ ↓ │
│ Signature is valid! ✓ │
└──────────────────────────────────────────────────────────────┘
```
### Attestation Bundles
For air-gapped verification, StellaOps bundles attestations with proofs:
```bash
# Export bundle with Rekor proofs
stella attest export-bundle \
--image sha256:abc123... \
--include-proofs \
--output attestation-bundle.json
# Verify offline
stella attest verify --offline \
--bundle attestation-bundle.json \
--artifact sha256:abc123...
```
## Troubleshooting
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `OIDC token expired` | Token older than 5 minutes | Re-acquire token before signing |
| `Fulcio unavailable` | Network issues | Check connectivity, increase timeout |
| `Certificate chain invalid` | Wrong Fulcio roots | Update root bundle |
| `Identity mismatch` | Wrong verify constraints | Check issuer and identity patterns |
| `Rekor proof missing` | Logging failed | Retry signing, check Rekor status |
### Debug Mode
```bash
# Enable verbose logging
STELLAOPS_LOG_LEVEL=debug stella attest sign --keyless --artifact sha256:...
# Inspect certificate details
stella attest inspect --artifact sha256:... --show-cert
```
## Security Considerations
### Best Practices
1. **Always verify identity**: Never accept `.*` as the full identity pattern
2. **Require Rekor proofs**: Use `--require-rekor` for production verification
3. **Pin OIDC issuers**: Only trust expected issuers
4. **Use environment constraints**: More specific than branch names
5. **Monitor signing activity**: Alert on unexpected identities
### Threat Model
| Threat | Mitigation |
|--------|------------|
| Stolen OIDC token | Short lifetime (~5 min), audience binding |
| Fulcio compromise | Certificate Transparency (SCT), multiple roots |
| Rekor compromise | Multiple witnesses, checkpoints, consistency proofs |
| Private key theft | Ephemeral keys, never persisted |
## Related Documentation
- [Signer Architecture](../architecture.md)
- [Attestor Bundle Format](../../attestor/bundle-format.md)
- [Air-Gap Verification](../../../airgap/attestation-verification.md)
- [CI/CD Integration](../../../guides/cicd-signing.md)
## External Resources
- [Sigstore Documentation](https://docs.sigstore.dev/)
- [Fulcio Overview](https://docs.sigstore.dev/certificate_authority/overview/)
- [Rekor Transparency Log](https://docs.sigstore.dev/logging/overview/)
- [cosign Keyless Signing](https://docs.sigstore.dev/signing/quickstart/)

View File

@@ -0,0 +1,348 @@
# Unified Triage Experience Specification
**Version:** 1.0
**Status:** Active
**Last Updated:** 2025-12-26
**Consolidated From:** 3 product advisories (see References)
## 1. Executive Summary
### The Problem
Modern container security generates overwhelming vulnerability data. Competitors offer fragmented solutions: Snyk provides reachability analysis, Anchore offers VEX annotations, Prisma Cloud delivers runtime signals. Security teams must context-switch between tools, losing precious time and context.
### The Stella Ops Solution
A **unified triage canvas** that combines:
- Rich evidence visualization with proof-carrying verdicts
- VEX decisioning as first-class policy objects
- AI-assisted analysis via AdvisoryAI
- Attestable exceptions with audit trails
- Offline-first architecture for air-gapped parity
## 2. Competitive Landscape
### Snyk — Reachability + Continuous Context
- Implements reachability analysis building call graphs
- Factors reachability into priority scores
- Uses static analysis + AI + expert curation
- Tracks issues over time without re-scanning unchanged images
### Anchore — Vulnerability Annotations + VEX Export
- Vulnerability annotation workflows via UI/API
- Labels: "not applicable", "mitigated", "under investigation"
- Export as OpenVEX and CycloneDX VEX
- Downstream consumers receive curated exploitability state
### Prisma Cloud — Runtime Defense
- Continuous behavioral profiling
- Process, file, and network rule enforcement
- Learning models baseline expected behavior
- Runtime context during operational incidents
### Stella Ops Differentiation
| Feature | Snyk | Anchore | Prisma | Stella Ops |
|---------|------|---------|--------|------------|
| Reachability analysis | Yes | Partial | No | Yes (static + binary + runtime) |
| VEX as policy objects | No | Export only | No | **First-class** |
| Attestable exceptions | No | No | No | **Yes (DSSE)** |
| Offline replay | No | No | No | **Yes** |
| AI-assisted triage | Yes | No | No | Yes (AdvisoryAI) |
| Evidence graphs | Partial | No | Partial | **Full chain** |
## 3. Core UI Concepts
### 3.1 Visual Diff Pattern
Every policy decision or reachability change is treated as a **visual diff**, enabling quick, explainable triage.
#### Side-by-Side Panes
- **Before** (previous scan/policy) vs **After** (current)
- Show dependency/reachability subgraph
- Highlight added/removed/changed nodes/edges
#### Evidence Strip (Right Rail)
Human-readable facts used by the engine:
- Feature flag status (e.g., "feature flag OFF")
- Code path analysis (e.g., "code path unreachable")
- Runtime traces (e.g., "kernel eBPF trace absent")
#### Diff Verdict Header
```
Risk ↓ from Medium → Low (policy v1.8 → v1.9)
```
#### Filter Chips
Scope by: component, package, CVE, policy rule, environment
### 3.2 Data Models
```typescript
interface GraphSnapshot {
nodes: GraphNode[];
edges: GraphEdge[];
metadata: { component: string; version: string; tags: string[] };
}
interface PolicySnapshot {
version: string;
rulesHash: string;
inputs: { flags: Record<string, boolean>; env: string; vexSources: string[] };
}
interface Delta {
added: DeltaItem[];
removed: DeltaItem[];
changed: DeltaItem[];
ruleOutcomes: RuleOutcomeDelta[];
}
interface EvidenceItem {
type: 'trace_hit' | 'sbom_line' | 'vex_claim' | 'config_value';
source: string;
digest: string;
excerpt: string;
timestamp: string;
}
interface SignedDeltaVerdict {
status: 'routine' | 'review' | 'block';
signatures: Signature[];
producer: string;
}
```
### 3.3 Micro-Interactions
| Interaction | Behavior |
|-------------|----------|
| Hover changed node | Inline badge explaining "why it changed" |
| Click rule in rail | Spotlight the exact subgraph affected |
| Toggle "explain like I'm new" | Expands jargon into plain language |
| One-click "copy audit bundle" | Exports delta + evidence as attachment |
### 3.4 Keyboard Shortcuts
| Key | Action |
|-----|--------|
| `1` | Focus changes only |
| `2` | Show full graph |
| `E` | Expand evidence |
| `A` | Export audit bundle |
| `N` | Next item in queue |
| `P` | Previous item |
| `M` | Mark not affected |
## 4. Risk Budget Visualization
### 4.1 Concept
- **Risk budget** = allowable unresolved risk for a release (e.g., 100 "risk points")
- **Burn** = consumption rate as alerts appear, minus "payback" from fixes
### 4.2 Dashboard Components
#### Heatmap of Unknowns
| Component | Vulns | Compliance | Perf | Data | Supply Chain |
|-----------|-------|------------|------|------|--------------|
| Service A | 🟡 12 | 🟢 0 | 🟡 3 | 🔴 8 | 🟡 5 |
| Service B | 🔴 24 | 🟡 2 | 🟢 1 | 🟡 4 | 🟢 0 |
Cell value = unknowns count × severity weight
#### Delta Table (Risk Decay per Release)
| Release | Before | After | Retired | Shifted | Unknowns |
|---------|--------|-------|---------|---------|----------|
| v2.3.1 | 85 | 67 | -22 | +4 | 12 |
| v2.3.0 | 92 | 85 | -15 | +8 | 18 |
#### Exception Ledger
Every accepted risk has: ID, owner, expiry, evidence note, auto-reminder.
### 4.3 Risk Budget Burn-Up Chart
```
Risk Points
^
100 |__________ Budget Line (flat or stepped)
| \
80 | \ ← Actual Risk (cumulative)
| \
60 | \_____ Headroom (green)
| \
40 | \__ Target by release
|
+---------------------------------> Time
T-30 T-14 T-7 T-2 Release
```
- **X-axis:** Calendar dates to code freeze
- **Y-axis:** Risk points
- **Two lines:** Budget (flat/stepped) + Actual Risk (daily)
- **Shaded area:** Headroom (green) or Overrun (red)
- **Markers:** Feature freeze, pen-test, dependency bumps
### 4.4 Computation Formulas
```typescript
// Risk points per issue
risk_points = severity_weight × exposure_factor × evidence_freshness_penalty
// Unknown penalty (no evidence ≤ N days)
if (days_since_evidence > threshold) {
risk_points *= 1.5; // multiplier
}
// Decay on fix
if (fix_landed && evidence_refreshed) {
subtract_points(issue.risk_points);
}
// Guardrails
if (unknowns > K || actual_risk > budget) {
fail_gate();
}
```
## 5. Implementation Components
### 5.1 Component Hierarchy
```
TriageCanvasComponent
├── TriageListComponent
│ ├── SeverityFilterComponent
│ ├── VulnerabilityRowComponent
│ └── BulkActionBarComponent
├── TriageDetailComponent
│ ├── AffectedPackagesPanel
│ ├── AdvisoryRefsPanel
│ ├── ReachabilityContextComponent
│ └── EvidenceProvenanceComponent
├── AiRecommendationPanel
│ ├── ReachabilityExplanation
│ ├── SuggestedJustification
│ └── SimilarVulnsComponent
├── VexDecisionModalComponent
│ ├── StatusSelector
│ ├── JustificationTypeSelector
│ ├── EvidenceRefInput
│ └── ScopeSelector
└── VexHistoryComponent
CompareViewComponent
├── BaselineSelectorComponent
├── TrustIndicatorsComponent
├── DeltaSummaryStripComponent
├── ThreePaneLayoutComponent
│ ├── CategoriesPaneComponent
│ ├── ItemsPaneComponent
│ └── ProofPaneComponent
├── ActionablesPanelComponent
└── ExportActionsComponent
RiskDashboardComponent
├── BurnUpChartComponent
├── UnknownsHeatmapComponent
├── DeltaTableComponent
├── ExceptionLedgerComponent
└── KpiTilesComponent
```
### 5.2 Service Layer
```typescript
// Core services
TriageService // Vulnerability list + filtering
VexDecisionService // CRUD for VEX decisions
AdvisoryAiService // AI recommendations
CompareService // Baseline + delta computation
RiskBudgetService // Budget + burn tracking
EvidenceService // Evidence retrieval
```
## 6. API Integration
### VulnExplorer Endpoints
```
GET /api/v1/vulnerabilities // List with filters
GET /api/v1/vulnerabilities/{id} // Detail
GET /api/v1/vulnerabilities/{id}/reachability // Call graph slice
POST /api/v1/vex-decisions // Create VEX decision
PUT /api/v1/vex-decisions/{id} // Update VEX decision
GET /api/v1/vex-decisions?vulnId={id} // History for vuln
```
### AdvisoryAI Endpoints
```
POST /api/v1/advisory/plan // Get analysis plan
POST /api/v1/advisory/execute // Execute analysis
GET /api/v1/advisory/output/{taskId} // Get recommendations
```
### Delta/Compare Endpoints
```
GET /api/v1/baselines/recommendations/{digest}
POST /api/v1/delta/compute
GET /api/v1/delta/{id}/trust-indicators
GET /api/v1/actionables/delta/{id}
```
## 7. Implementation Status
| Component | Sprint | Status |
|-----------|--------|--------|
| Risk Dashboard Base | SPRINT_20251226_004_FE | TODO |
| Smart-Diff Compare View | SPRINT_20251226_012_FE | TODO |
| Unified Triage Canvas | SPRINT_20251226_013_FE | TODO |
| Documentation Consolidation | SPRINT_20251226_014_DOCS | TODO |
| VEX Decision Models | VulnExplorer/Models | **COMPLETE** |
| AdvisoryAI Pipeline | src/AdvisoryAI | **COMPLETE** |
| Confidence Badge | Web/shared/components | **COMPLETE** |
| Release Flow | Web/features/releases | **COMPLETE** |
## 8. Testing Strategy
### Unit Tests
- Component behavior (selection, filtering, expansion)
- Signal/computed derivations
- Role-based view switching
- Form validation (VEX decisions)
### Integration Tests
- API service calls and response handling
- Navigation and routing
- State persistence across route changes
### E2E Tests
- Full triage workflow: list → detail → VEX decision
- Comparison workflow: select baseline → compute delta → export
- Risk budget: view charts → create exception → see update
### Accessibility Tests
- Keyboard navigation completeness
- Screen reader announcements
- Color contrast compliance
## 9. Success Metrics
| Metric | Definition | Target |
|--------|------------|--------|
| Mean Time to Triage (MTTT) | Time from vuln notification to VEX decision | < 5 min |
| Mean Time to Explain (MTTE) | Time from "why did this change?" to "Understood" click | < 2 min |
| Triage Queue Throughput | Vulns triaged per hour per analyst | > 20 |
| AI Recommendation Acceptance | % of AI suggestions accepted without modification | > 60% |
## 10. References
### Archived Advisories (Consolidated Here)
- `archived/2025-12-26-triage-advisories/25-Dec-2025 - Triage UI Lessons from Competitors.md`
- `archived/2025-12-26-triage-advisories/25-Dec-2025 - Visual Diffs for Explainable Triage.md`
- `archived/2025-12-26-triage-advisories/26-Dec-2026 - Visualizing the Risk Budget.md`
### Related Documentation
- `docs/modules/web/smart-diff-ui-architecture.md`
- `docs/implplan/SPRINT_20251226_004_FE_risk_dashboard.md`
- `docs/implplan/SPRINT_20251226_012_FE_smart_diff_compare.md`
- `docs/implplan/SPRINT_20251226_013_FE_triage_canvas.md`
### External References
- [Snyk Reachability Analysis](https://docs.snyk.io/manage-risk/prioritize-issues-for-fixing/reachability-analysis)
- [Anchore Vulnerability Annotations](https://docs.anchore.com/current/docs/vulnerability_management/vuln_annotations/)
- [Prisma Cloud Runtime Defense](https://docs.prismacloud.io/en/compute-edition/30/admin-guide/runtime-defense/)