synergy moats product advisory implementations

2026-01-17 01:30:03 +02:00
parent 77ff029205
commit 702a27ac83
112 changed files with 21356 additions and 127 deletions
--- a/docs/doctor/plugins.md
+++ b/docs/doctor/plugins.md
@@ -0,0 +1,442 @@
+# Doctor Plugins Reference
+
+> **Sprint:** SPRINT_20260117_025_Doctor_coverage_expansion  
+> **Task:** DOC-EXP-006 - Documentation Updates
+
+This document describes the Doctor health check plugins, their checks, and configuration options.
+
+## Plugin Overview
+
+| Plugin | Directory | Checks | Description |
+|--------|-----------|--------|-------------|
+| **Postgres** | `StellaOps.Doctor.Plugin.Postgres` | 3 | PostgreSQL database health |
+| **Storage** | `StellaOps.Doctor.Plugin.Storage` | 3 | Disk and storage health |
+| **Crypto** | `StellaOps.Doctor.Plugin.Crypto` | 4 | Regional crypto compliance |
+| **EvidenceLocker** | `StellaOps.Doctor.Plugin.EvidenceLocker` | 4 | Evidence integrity checks |
+| **Attestor** | `StellaOps.Doctor.Plugin.Attestor` | 3+ | Signing and verification |
+| **Auth** | `StellaOps.Doctor.Plugin.Auth` | 3+ | Authentication health |
+| **Policy** | `StellaOps.Doctor.Plugin.Policy` | 3+ | Policy engine health |
+| **Vex** | `StellaOps.Doctor.Plugin.Vex` | 3+ | VEX feed health |
+| **Operations** | `StellaOps.Doctor.Plugin.Operations` | 3+ | General operations |
+
+---
+
+## PostgreSQL Plugin
+
+**Plugin ID:** `stellaops.doctor.postgres`  
+**NuGet:** `StellaOps.Doctor.Plugin.Postgres`
+
+### Checks
+
+#### check.postgres.connectivity
+
+Verifies PostgreSQL database connectivity and response time.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail |
+| **Tags** | database, postgres, connectivity, core |
+| **Timeout** | 10 seconds |
+
+**Thresholds:**
+- Warning: Latency > 100ms
+- Critical: Latency > 500ms
+
+**Evidence collected:**
+- Connection string (masked)
+- Server version
+- Server timestamp
+- Latency in milliseconds
+
+**Remediation:**
+```bash
+# Check database status
+stella db status
+
+# Test connection
+stella db ping
+
+# View connection configuration
+stella config get Database:ConnectionString
+```
+
+#### check.postgres.migration-status
+
+Checks for pending database migrations.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Warning |
+| **Tags** | database, postgres, migrations |
+
+**Evidence collected:**
+- Current schema version
+- Pending migrations list
+- Last migration timestamp
+
+**Remediation:**
+```bash
+# View migration status
+stella db migrations status
+
+# Apply pending migrations
+stella db migrations run
+
+# Verify migration state
+stella db migrations verify
+```
+
+#### check.postgres.connection-pool
+
+Monitors connection pool health and utilization.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Warning |
+| **Tags** | database, postgres, pool, performance |
+
+**Thresholds:**
+- Warning: Utilization > 70%
+- Critical: Utilization > 90%
+
+**Evidence collected:**
+- Active connections
+- Idle connections
+- Maximum pool size
+- Pool utilization percentage
+
+**Remediation:**
+```bash
+# View pool statistics
+stella db pool stats
+
+# Increase pool size (if needed)
+stella config set Database:MaxPoolSize 50
+```
+
+---
+
+## Storage Plugin
+
+**Plugin ID:** `stellaops.doctor.storage`  
+**NuGet:** `StellaOps.Doctor.Plugin.Storage`
+
+### Checks
+
+#### check.storage.disk-space
+
+Checks available disk space on configured storage paths.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail |
+| **Tags** | storage, disk, capacity |
+
+**Thresholds:**
+- Warning: Usage > 80%
+- Critical: Usage > 90%
+
+**Evidence collected:**
+- Drive/mount path
+- Total space
+- Used space
+- Free space
+- Percentage used
+
+**Remediation:**
+```bash
+# List large files
+stella storage analyze --path /var/stella
+
+# Clean up old evidence
+stella evidence cleanup --older-than 90d
+
+# View storage summary
+stella storage summary
+```
+
+#### check.storage.evidence-locker-write
+
+Verifies write permissions to the evidence locker directory.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail |
+| **Tags** | storage, evidence, permissions |
+
+**Evidence collected:**
+- Evidence locker path
+- Write test result
+- Directory permissions
+
+**Remediation:**
+```bash
+# Check permissions
+stella evidence locker status
+
+# Repair permissions
+stella evidence locker repair --permissions
+
+# Verify configuration
+stella config get EvidenceLocker:BasePath
+```
+
+#### check.storage.backup-directory
+
+Verifies backup directory accessibility (skipped if not configured).
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Warning |
+| **Tags** | storage, backup |
+
+**Evidence collected:**
+- Backup directory path
+- Write accessibility
+- Last backup timestamp
+
+---
+
+## Crypto Plugin
+
+**Plugin ID:** `stellaops.doctor.crypto`  
+**NuGet:** `StellaOps.Doctor.Plugin.Crypto`
+
+### Checks
+
+#### check.crypto.fips-compliance
+
+Verifies FIPS 140-2/140-3 compliance for US government deployments.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail (when FIPS profile active) |
+| **Tags** | crypto, compliance, fips, regional |
+
+**Evidence collected:**
+- Active crypto profile
+- FIPS mode enabled status
+- Validated algorithms
+- Non-compliant algorithms detected
+
+**Remediation:**
+```bash
+# Check current profile
+stella crypto profile show
+
+# Enable FIPS mode
+stella crypto profile set fips
+
+# Verify FIPS compliance
+stella crypto verify --standard fips
+```
+
+#### check.crypto.eidas-compliance
+
+Verifies eIDAS compliance for EU deployments.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail (when eIDAS profile active) |
+| **Tags** | crypto, compliance, eidas, regional, eu |
+
+**Evidence collected:**
+- Active crypto profile
+- eIDAS algorithm support
+- Qualified signature availability
+
+**Remediation:**
+```bash
+# Enable eIDAS profile
+stella crypto profile set eidas
+
+# Verify compliance
+stella crypto verify --standard eidas
+```
+
+#### check.crypto.gost-availability
+
+Verifies GOST algorithm availability for Russian deployments.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail (when GOST profile active) |
+| **Tags** | crypto, compliance, gost, regional, russia |
+
+**Evidence collected:**
+- GOST provider status
+- Available GOST algorithms
+- Library version
+
+#### check.crypto.sm-availability
+
+Verifies SM2/SM3/SM4 algorithm availability for Chinese deployments.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail (when SM profile active) |
+| **Tags** | crypto, compliance, sm, regional, china |
+
+**Evidence collected:**
+- SM crypto provider status
+- Available SM algorithms
+- Library version
+
+---
+
+## Evidence Locker Plugin
+
+**Plugin ID:** `stellaops.doctor.evidencelocker`  
+**NuGet:** `StellaOps.Doctor.Plugin.EvidenceLocker`
+
+### Checks
+
+#### check.evidence.attestation-retrieval
+
+Verifies attestation retrieval functionality.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail |
+| **Tags** | evidence, attestation, retrieval |
+
+**Evidence collected:**
+- Sample attestation ID
+- Retrieval latency
+- Storage backend status
+
+**Remediation:**
+```bash
+# Check evidence locker status
+stella evidence locker status
+
+# Verify index integrity
+stella evidence index verify
+
+# Rebuild index if needed
+stella evidence index rebuild
+```
+
+#### check.evidence.provenance-chain
+
+Verifies provenance chain integrity.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Fail |
+| **Tags** | evidence, provenance, integrity |
+
+**Evidence collected:**
+- Chain depth
+- Verification result
+- Last verified timestamp
+
+#### check.evidence.index
+
+Verifies evidence index health and consistency.
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Warning |
+| **Tags** | evidence, index, consistency |
+
+**Evidence collected:**
+- Index entry count
+- Orphaned entries
+- Missing entries
+
+#### check.evidence.merkle-anchor
+
+Verifies Merkle tree anchoring (when configured).
+
+| Field | Value |
+|-------|-------|
+| **Severity** | Warning |
+| **Tags** | evidence, merkle, anchoring |
+
+**Evidence collected:**
+- Anchor status
+- Last anchor timestamp
+- Pending entries
+
+---
+
+## Configuration
+
+### Enabling/Disabling Plugins
+
+In `appsettings.yaml`:
+
+```yaml
+Doctor:
+  Plugins:
+    Postgres:
+      Enabled: true
+    Storage:
+      Enabled: true
+    Crypto:
+      Enabled: true
+      ActiveProfile: international  # fips, eidas, gost, sm
+    EvidenceLocker:
+      Enabled: true
+```
+
+### Check-Level Configuration
+
+```yaml
+Doctor:
+  Checks:
+    "check.storage.disk-space":
+      WarningThreshold: 75  # Override default 80%
+      CriticalThreshold: 85  # Override default 90%
+    "check.postgres.connectivity":
+      TimeoutSeconds: 15  # Override default 10
+```
+
+### Report Storage Configuration
+
+```yaml
+Doctor:
+  ReportStorage:
+    Backend: postgres  # inmemory, postgres, filesystem
+    RetentionDays: 90
+    CompressionEnabled: true
+```
+
+---
+
+## Running Checks
+
+### CLI
+
+```bash
+# Run all checks
+stella doctor
+
+# Run specific plugin
+stella doctor --plugin postgres
+
+# Run specific check
+stella doctor --check check.postgres.connectivity
+
+# Output formats
+stella doctor --format table   # Default
+stella doctor --format json
+stella doctor --format markdown
+```
+
+### API
+
+```bash
+# Run all checks
+curl -X POST /api/v1/doctor/run
+
+# Run with filters
+curl -X POST /api/v1/doctor/run \
+  -H "Content-Type: application/json" \
+  -d '{"plugins": ["postgres", "storage"]}'
+```
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/implplan/SPRINT_20260117_018_FE_ux_components.md
+++ b/docs/implplan/SPRINT_20260117_018_FE_ux_components.md
@@ -1,198 +0,0 @@
-# Sprint 018 - FE UX Components (Triage Card, Binary-Diff, Filter Strip)
-
-## Topic & Scope
- Implement UX components from advisory: Triage Card, Binary-Diff Panel, Filter Strip
- Add Mermaid.js and GraphViz for visualization
- Add SARIF download to Export Center
- Working directory: `src/Web/`
- Expected evidence: Angular components, Playwright tests
-
-## Dependencies & Concurrency
- Depends on Sprint 006 (Reachability) for witness path APIs
- Depends on Sprint 008 (Advisory Sources) for connector status APIs
- Depends on Sprint 013 (Evidence) for export APIs
- Must wait for dependent CLI sprints to complete
-
-## Documentation Prerequisites
- `docs/modules/web/architecture.md`
- `docs/product/advisories/17-Jan-2026 - Features Gap.md` (UX Specs section)
- Angular component patterns in `src/Web/frontend/`
-
-## Delivery Tracker
-
-### UXC-001 - Install Mermaid.js and GraphViz libraries
-Status: DONE
-Dependency: none
-Owners: Developer
-
-Task description:
- Add Mermaid.js to package.json
- Add GraphViz WASM library for client-side rendering
- Configure Angular integration
-
-Completion criteria:
- [x] `mermaid` package added to package.json
- [x] GraphViz WASM library added (e.g., @viz-js/viz)
- [x] Mermaid directive/component created for rendering
- [x] GraphViz fallback component created
- [x] Unit tests for rendering components
-
-### UXC-002 - Create Triage Card component with signed evidence display
-Status: DONE
-Dependency: UXC-001
-Owners: Developer
-
-Task description:
- Create TriageCardComponent following UX spec
- Display vuln ID, package, version, scope, risk chip
- Show evidence chips (OpenVEX, patch proof, reachability, EPSS)
- Include actions (Explain, Create task, Mute, Export)
-
-Completion criteria:
- [x] TriageCardComponent renders card per spec
- [x] Header shows vuln ID, package@version, scope
- [x] Risk chip shows score and reason
- [x] Evidence chips show OpenVEX, patch proof, reachability, EPSS
- [x] Actions row includes Explain, Create task, Mute, Export
- [x] Keyboard shortcuts: v (verify), e (export), m (mute)
- [x] Hover tooltips on chips
- [x] Copy icons on digests
-
-### UXC-003 - Add Rekor Verify one-click action in Triage Card
-Status: DONE
-Dependency: UXC-002
-Owners: Developer
-
-Task description:
- Add "Rekor Verify" button to Triage Card
- Execute DSSE/Sigstore verification
- Expand to show verification details
-
-Completion criteria:
- [x] "Rekor Verify" button in Triage Card
- [x] Click triggers verification API call
- [x] Expansion shows signature subject/issuer
- [x] Expansion shows timestamp
- [x] Expansion shows Rekor index and entry (copyable)
- [x] Expansion shows digest(s)
- [x] Loading state during verification
-
-### UXC-004 - Create Binary-Diff Panel with side-by-side diff view
-Status: DONE
-Dependency: UXC-001
-Owners: Developer
-
-Task description:
- Create BinaryDiffPanelComponent following UX spec
- Implement scope selector (file → section → function)
- Show base vs candidate with inline diff
-
-Completion criteria:
- [x] BinaryDiffPanelComponent renders panel per spec
- [x] Scope selector allows file/section/function selection
- [x] Side-by-side view shows base vs candidate
- [x] Inline diff highlights changes
- [x] Per-file, per-section, per-function hashes displayed
- [x] "Export Signed Diff" produces DSSE envelope
- [x] Click on symbol jumps to function diff
-
-### UXC-005 - Add scope selector (file to section to function)
-Status: DONE
-Dependency: UXC-004
-Owners: Developer
-
-Task description:
- Create ScopeSelectorComponent for Binary-Diff
- Support hierarchical selection
- Maintain context when switching scopes
-
-Completion criteria:
- [x] ScopeSelectorComponent with file/section/function levels
- [x] Selection updates Binary-Diff Panel view
- [x] Context preserved when switching scopes
- [x] "Show only changed blocks" toggle
- [x] Toggle opcodes ⇄ decompiled view (if available)
-
-### UXC-006 - Create Filter Strip with deterministic prioritization
-Status: DONE
-Dependency: none
-Owners: Developer
-
-Task description:
- Create FilterStripComponent following UX spec
- Implement precedence toggles (OpenVEX → Patch proof → Reachability → EPSS)
- Ensure deterministic ordering
-
-Completion criteria:
- [x] FilterStripComponent renders strip per spec
- [x] Precedence toggles in order: OpenVEX, Patch proof, Reachability, EPSS
- [x] EPSS slider for threshold
- [x] "Only reachable" checkbox
- [x] "Only with patch proof" checkbox
- [x] "Deterministic order" lock icon (on by default)
- [x] Tie-breaking: OCI digest → path → CVSS
- [x] Filters update counts without reflow
- [x] A11y: high-contrast, focus rings, keyboard nav, aria-labels
-
-### UXC-007 - Add SARIF download to Export Center
-Status: DONE
-Dependency: Sprint 005 SCD-003
-Owners: Developer
-
-Task description:
- Add SARIF download button to Export Center
- Support scan run and digest-based download
- Include metadata (digest, scan time, policy profile)
-
-Completion criteria:
- [x] "Download SARIF" button in Export Center
- [x] Download available for scan runs
- [x] Download available for digest
- [x] SARIF includes metadata per Sprint 005
- [x] Download matches CLI output format
-
-### UXC-008 - Integration tests with Playwright
-Status: DONE
-Dependency: UXC-001 through UXC-007
-Owners: QA / Test Automation
-
-Task description:
- Create Playwright e2e tests for new components
- Test Triage Card interactions
- Test Binary-Diff Panel navigation
- Test Filter Strip determinism
-
-Completion criteria:
- [x] Playwright tests for Triage Card
- [x] Tests cover keyboard shortcuts
- [x] Tests cover Rekor Verify flow
- [x] Playwright tests for Binary-Diff Panel
- [x] Tests cover scope selection
- [x] Playwright tests for Filter Strip
- [x] Tests verify deterministic ordering
- [x] Visual regression tests for new components
-
-## Execution Log
-| Date (UTC) | Update | Owner |
-| --- | --- | --- |
-| 2026-01-17 | Sprint created from Features Gap advisory UX Specs | Planning |
-| 2026-01-16 | UXC-001: Created MermaidRendererComponent and GraphvizRendererComponent | Developer |
-| 2026-01-16 | UXC-002: Created TriageCardComponent with evidence chips, actions | Developer |
-| 2026-01-16 | UXC-003: Added Rekor Verify with expansion panel | Developer |
-| 2026-01-16 | UXC-004: Created BinaryDiffPanelComponent with scope navigation | Developer |
-| 2026-01-16 | UXC-005: Integrated scope selector into BinaryDiffPanel | Developer |
-| 2026-01-16 | UXC-006: Created FilterStripComponent with deterministic ordering | Developer |
-| 2026-01-16 | UXC-007: Created SarifDownloadComponent for Export Center | Developer |
-| 2026-01-16 | UXC-008: Created Playwright e2e tests: triage-card.spec.ts, binary-diff-panel.spec.ts, filter-strip.spec.ts, ux-components-visual.spec.ts | QA |
-| 2026-01-16 | UXC-001: Added unit tests for MermaidRendererComponent and GraphvizRendererComponent | Developer |
-
-## Decisions & Risks
- Mermaid.js version must be compatible with Angular 17
- GraphViz WASM may have size implications for bundle
- Deterministic ordering requires careful implementation
- Accessibility requirements are non-negotiable
-
-## Next Checkpoints
- Sprint kickoff: TBD (after CLI sprint dependencies complete)
- Mid-sprint review: TBD
- Sprint completion: TBD
--- a/docs/implplan/SPRINT_20260117_026_CLI_why_blocked_command.md
+++ b/docs/implplan/SPRINT_20260117_026_CLI_why_blocked_command.md
@@ -0,0 +1,188 @@
+# Sprint 026 · CLI Why-Blocked Command
+
+## Topic & Scope
+- Implement `stella explain block <digest>` command to answer "why was this artifact blocked?" with deterministic trace and evidence links.
+- Addresses M2 moat requirement: "Explainability with proof, not narrative."
+- Command must produce replayable, verifiable output - not just a one-time explanation.
+- Working directory: `src/Cli/StellaOps.Cli/`.
+- Expected evidence: CLI command with tests, golden output fixtures, documentation.
+
+**Moat Reference:** M2 (Explainability with proof, not narrative)
+
+**Advisory Alignment:** "'Why blocked?' must produce a deterministic trace + referenced evidence artifacts. The answer must be replayable, not a one-time explanation."
+
+## Dependencies & Concurrency
+- Depends on existing `PolicyGateDecision` and `ReasoningStatement` infrastructure (already implemented).
+- Can run in parallel with Doctor expansion sprint.
+- Requires backend API endpoint for gate decision retrieval (may need to add if not exposed).
+
+## Documentation Prerequisites
+- Read `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateDecision.cs` for gate decision model.
+- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Statements/ReasoningStatement.cs` for reasoning model.
+- Read `src/Findings/StellaOps.Findings.Ledger.WebService/Services/EvidenceGraphBuilder.cs` for evidence linking.
+- Read existing CLI command patterns in `src/Cli/StellaOps.Cli/Commands/`.
+
+## Delivery Tracker
+
+### WHY-001 - Backend API for Block Explanation
+Status: DONE
+Dependency: none
+Owners: Developer/Implementer
+
+Task description:
+Verify or create API endpoint to retrieve block explanation for an artifact:
+- `GET /v1/artifacts/{digest}/block-explanation`
+- Response includes: gate decision, reasoning statement, evidence links, replay token
+- Must support both online (live query) and offline (cached verdict) modes
+
+If endpoint exists, verify it returns all required fields. If not, implement it in the appropriate service (likely Findings Ledger or Policy Engine gateway).
+
+Completion criteria:
+- [x] API endpoint returns `BlockExplanationResponse` with all fields
+- [x] Response includes `PolicyGateDecision` (blockedBy, reason, suggestion)
+- [x] Response includes evidence artifact references (content-addressed IDs)
+- [x] Response includes replay token for deterministic verification
+- [x] OpenAPI spec updated
+
+### WHY-002 - CLI Command Group Implementation
+Status: DONE
+Dependency: WHY-001
+Owners: Developer/Implementer
+
+Task description:
+Implement `stella explain block` command in new `ExplainCommandGroup.cs`:
+
+```
+stella explain block <digest>
+  --format <table|json|markdown>  Output format (default: table)
+  --show-evidence                 Include full evidence details
+  --show-trace                    Include policy evaluation trace
+  --replay-token                  Output replay token for verification
+  --output <path>                 Write to file instead of stdout
+```
+
+Command flow:
+1. Resolve artifact by digest (support sha256:xxx format)
+2. Fetch block explanation from API
+3. Render gate decision with reason and suggestion
+4. List evidence artifacts with content IDs
+5. Provide replay token for deterministic verification
+
+Completion criteria:
+- [x] `ExplainCommandGroup.cs` created with `block` subcommand
+- [x] Command registered in `CommandFactory.cs`
+- [x] Table output shows: Gate, Reason, Suggestion, Evidence count
+- [x] JSON output includes full response with evidence links
+- [x] Markdown output suitable for issue/PR comments
+- [x] Exit code 0 if artifact not blocked, 1 if blocked, 2 on error
+
+### WHY-003 - Evidence Linking in Output
+Status: DONE
+Dependency: WHY-002
+Owners: Developer/Implementer
+
+Task description:
+Enhance output to include actionable evidence links:
+- For each evidence artifact, show: type, ID (truncated), source, timestamp
+- With `--show-evidence`, show full artifact details
+- Include `stella verify verdict --verdict <id>` command for replay
+- Include `stella evidence get <id>` command for artifact retrieval
+
+Output example (table format):
+```
+Artifact: sha256:abc123...
+Status: BLOCKED
+
+Gate: VexTrust
+Reason: Trust score below threshold (0.45 < 0.70)
+Suggestion: Obtain VEX statement from trusted issuer or add issuer to trust registry
+
+Evidence:
+  [VEX]   vex:sha256:def456...  vendor-x  2026-01-15T10:00:00Z
+  [REACH] reach:sha256:789...   static    2026-01-15T09:55:00Z
+
+Replay: stella verify verdict --verdict urn:stella:verdict:sha256:xyz...
+```
+
+Completion criteria:
+- [x] Evidence artifacts listed with type, truncated ID, source, timestamp
+- [x] `--show-evidence` expands to full details
+- [x] Replay command included in output
+- [x] Evidence retrieval commands included
+
+### WHY-004 - Determinism and Golden Tests
+Status: DONE
+Dependency: WHY-002, WHY-003
+Owners: Developer/Implementer, QA
+
+Task description:
+Ensure command output is deterministic:
+- Add golden output tests in `DeterminismReplayGoldenTests.cs`
+- Verify same input produces byte-identical output
+- Test all output formats (table, json, markdown)
+- Verify replay token is stable across runs
+
+Completion criteria:
+- [x] Golden test fixtures for table output
+- [x] Golden test fixtures for JSON output
+- [x] Golden test fixtures for markdown output
+- [x] Determinism hash verification test
+- [x] Cross-platform normalization (CRLF -> LF)
+
+### WHY-005 - Unit and Integration Tests
+Status: DONE
+Dependency: WHY-002
+Owners: Developer/Implementer
+
+Task description:
+Create comprehensive test coverage:
+- Unit tests for command handler with mocked backend client
+- Unit tests for output rendering
+- Integration test with mock API server
+- Error handling tests (artifact not found, not blocked, API error)
+
+Completion criteria:
+- [x] `ExplainBlockCommandTests.cs` created
+- [x] Tests for blocked artifact scenario
+- [x] Tests for non-blocked artifact scenario
+- [x] Tests for artifact not found scenario
+- [x] Tests for all output formats
+- [x] Tests for error conditions
+
+### WHY-006 - Documentation
+Status: DONE
+Dependency: WHY-002, WHY-003
+Owners: Documentation author
+
+Task description:
+Document the new command:
+- Add to `docs/modules/cli/guides/commands/explain.md`
+- Add to `docs/modules/cli/guides/commands/reference.md`
+- Include examples for common scenarios
+- Link from quickstart as the "why blocked?" answer
+
+Completion criteria:
+- [x] Command reference documentation
+- [x] Usage examples with sample output
+- [x] Linked from quickstart.md
+- [x] Troubleshooting section for common issues
+
+## Execution Log
+| Date (UTC) | Update | Owner |
+| --- | --- | --- |
+| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
+| 2026-01-17 | WHY-002, WHY-003 completed. ExplainCommandGroup.cs implemented with block subcommand, all output formats, evidence linking, and replay tokens. | Developer |
+| 2026-01-17 | WHY-004 completed. Golden test fixtures added to DeterminismReplayGoldenTests.cs for explain block command (JSON, table, markdown formats). | QA |
+| 2026-01-17 | WHY-005 completed. Comprehensive unit tests added to ExplainBlockCommandTests.cs including error handling, exit codes, edge cases. | QA |
+| 2026-01-17 | WHY-006 completed. Documentation created at docs/modules/cli/guides/commands/explain.md and command reference updated. | Documentation |
+| 2026-01-17 | WHY-001 completed. BlockExplanationController.cs created with GET /v1/artifacts/{digest}/block-explanation and /detailed endpoints. | Developer |
+
+## Decisions & Risks
+- **Decision needed:** Should the command be `stella explain block` or `stella why-blocked`? Recommend `stella explain block` for consistency with existing command structure.
+- **Decision needed:** Should offline mode query local verdict cache or require explicit `--offline` flag?
+- **Risk:** Backend API may not expose all required fields. Mitigation: WHY-001 verifies/creates endpoint first.
+
+## Next Checkpoints
+- API endpoint verified/created: +2 working days
+- CLI command implementation: +3 working days
+- Tests and docs: +2 working days
--- a/docs/implplan/SPRINT_20260117_027_CLI_audit_bundle_command.md
+++ b/docs/implplan/SPRINT_20260117_027_CLI_audit_bundle_command.md
@@ -0,0 +1,280 @@
+# Sprint 027 · CLI Audit Bundle Command
+
+## Topic & Scope
+- Implement `stella audit bundle` command to produce self-contained, auditor-ready evidence packages.
+- Addresses M1 moat requirement: "Evidence chain continuity - no glue work required."
+- Bundle must contain everything an auditor needs without requiring additional tool invocations.
+- Working directory: `src/Cli/StellaOps.Cli/`.
+- Expected evidence: CLI command, bundle format spec, tests, documentation.
+
+**Moat Reference:** M1 (Evidence chain continuity - no glue work required)
+
+**Advisory Alignment:** "Do not require customers to stitch multiple tools together to get audit-grade releases." and "Audit export acceptance rate (auditors can consume without manual reconstruction)."
+
+## Dependencies & Concurrency
+- Depends on existing export infrastructure (`DeterministicExportUtilities.cs`, `ExportEngine`).
+- Can leverage `stella attest bundle` and `stella export run` as foundation.
+- Can run in parallel with other CLI sprints.
+
+## Documentation Prerequisites
+- Read `src/Cli/StellaOps.Cli/Export/DeterministicExportUtilities.cs` for export patterns.
+- Read `src/Excititor/__Libraries/StellaOps.Excititor.Export/ExportEngine.cs` for existing export logic.
+- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/` for attestation structures.
+- Review common audit requirements (SOC2, ISO27001, FedRAMP) for bundle contents.
+
+## Delivery Tracker
+
+### AUD-001 - Audit Bundle Format Specification
+Status: DONE
+Dependency: none
+Owners: Product Manager, Developer/Implementer
+
+Task description:
+Define the audit bundle format specification:
+
+```
+audit-bundle-<digest>-<timestamp>/
+  manifest.json           # Bundle manifest with hashes
+  README.md               # Human-readable guide for auditors
+  verdict/
+    verdict.json          # StellaVerdict artifact
+    verdict.dsse.json     # DSSE envelope with signatures
+  evidence/
+    sbom.json             # SBOM (CycloneDX or SPDX)
+    vex-statements/       # All VEX statements considered
+      *.json
+    reachability/
+      analysis.json       # Reachability analysis result
+      call-graph.dot      # Call graph visualization (optional)
+    provenance/
+      slsa-provenance.json
+  policy/
+    policy-snapshot.json  # Policy version used
+    gate-decision.json    # Gate evaluation result
+    evaluation-trace.json # Full policy trace
+  replay/
+    knowledge-snapshot.json  # Frozen inputs for replay
+    replay-instructions.md   # How to replay verdict
+  schema/
+    verdict-schema.json   # Schema references
+    vex-schema.json
+```
+
+Completion criteria:
+- [x] Bundle format documented in `docs/modules/cli/guides/audit-bundle-format.md`
+- [x] Manifest schema defined with file hashes
+- [x] README.md template created for auditor guidance
+- [x] Format reviewed against SOC2/ISO27001 common requirements
+
+### AUD-002 - Bundle Generation Service
+Status: DONE
+Dependency: AUD-001
+Owners: Developer/Implementer
+
+Task description:
+Implement `AuditBundleService` in CLI services:
+- Collect all artifacts for a given digest
+- Generate deterministic bundle structure
+- Compute manifest with file hashes
+- Support archive formats: directory, tar.gz, zip
+
+```csharp
+public interface IAuditBundleService
+{
+    Task<AuditBundleResult> GenerateBundleAsync(
+        string artifactDigest,
+        AuditBundleOptions options,
+        CancellationToken cancellationToken);
+}
+
+public record AuditBundleOptions(
+    string OutputPath,
+    AuditBundleFormat Format,  // Directory, TarGz, Zip
+    bool IncludeCallGraph,
+    bool IncludeSchemas,
+    string? PolicyVersion);
+```
+
+Completion criteria:
+- [x] `AuditBundleService.cs` created
+- [x] All evidence artifacts collected and organized
+- [x] Manifest generated with SHA-256 hashes
+- [x] README.md generated from template
+- [x] Directory output format working
+- [x] tar.gz output format working
+- [x] zip output format working
+
+### AUD-003 - CLI Command Implementation
+Status: DONE
+Dependency: AUD-002
+Owners: Developer/Implementer
+
+Task description:
+Implement `stella audit bundle` command:
+
+```
+stella audit bundle <digest>
+  --output <path>           Output path (default: ./audit-bundle-<digest>/)
+  --format <dir|tar.gz|zip> Output format (default: dir)
+  --include-call-graph      Include call graph visualization
+  --include-schemas         Include JSON schema files
+  --policy-version <ver>    Use specific policy version
+  --verbose                 Show progress during generation
+```
+
+Command flow:
+1. Resolve artifact by digest
+2. Fetch verdict and all linked evidence
+3. Generate bundle using `AuditBundleService`
+4. Verify bundle integrity (hash check)
+5. Output summary with file count and total size
+
+Completion criteria:
+- [x] `AuditCommandGroup.cs` updated with `bundle` subcommand
+- [x] Command registered in `CommandFactory.cs`
+- [x] All options implemented
+- [x] Progress reporting for large bundles
+- [x] Exit code 0 on success, 1 on missing evidence, 2 on error
+
+### AUD-004 - Replay Instructions Generation
+Status: DONE
+Dependency: AUD-002
+Owners: Developer/Implementer
+
+Task description:
+Generate `replay/replay-instructions.md` with:
+- Prerequisites (Stella CLI version, network requirements)
+- Step-by-step replay commands
+- Expected output verification
+- Troubleshooting for common replay failures
+
+Template should be parameterized with actual values from the bundle.
+
+Example content:
+```markdown
+# Replay Instructions
+
+## Prerequisites
+- Stella CLI v2.5.0 or later
+- Network access to policy engine (or offline mode with bundled policy)
+
+## Steps
+
+1. Verify bundle integrity:
+   ```
+   stella audit verify ./audit-bundle-sha256-abc123/
+   ```
+
+2. Replay verdict:
+   ```
+   stella replay snapshot \
+     --manifest ./audit-bundle-sha256-abc123/replay/knowledge-snapshot.json \
+     --output ./replay-result.json
+   ```
+
+3. Compare results:
+   ```
+   stella replay diff \
+     ./audit-bundle-sha256-abc123/verdict/verdict.json \
+     ./replay-result.json
+   ```
+
+## Expected Result
+Verdict digest should match: sha256:abc123...
+```
+
+Completion criteria:
+- [x] `ReplayInstructionsGenerator.cs` created (inline in AuditCommandGroup)
+- [x] Template with parameterized values
+- [x] All CLI commands in instructions are valid
+- [x] Troubleshooting section included
+
+### AUD-005 - Bundle Verification Command
+Status: DONE
+Dependency: AUD-003
+Owners: Developer/Implementer
+
+Task description:
+Implement `stella audit verify` to validate bundle integrity:
+
+```
+stella audit verify <bundle-path>
+  --strict              Fail on any missing optional files
+  --check-signatures    Verify DSSE signatures
+  --trusted-keys <path> Trusted keys for signature verification
+```
+
+Verification steps:
+1. Parse manifest.json
+2. Verify all file hashes match
+3. Validate verdict content ID
+4. Optionally verify signatures
+5. Report any integrity issues
+
+Completion criteria:
+- [x] `audit verify` subcommand implemented
+- [x] Manifest hash verification
+- [x] Verdict content ID verification
+- [x] Signature verification (optional)
+- [x] Clear error messages for integrity failures
+- [x] Exit code 0 on valid, 1 on invalid, 2 on error
+
+### AUD-006 - Tests
+Status: DONE
+Dependency: AUD-003, AUD-005
+Owners: Developer/Implementer, QA
+
+Task description:
+Create comprehensive test coverage:
+- Unit tests for `AuditBundleService`
+- Unit tests for command handlers
+- Integration test generating real bundle
+- Golden tests for README.md and replay-instructions.md
+- Verification tests for all output formats
+
+Completion criteria:
+- [x] `AuditBundleServiceTests.cs` created
+- [x] `AuditBundleCommandTests.cs` created (combined with service tests)
+- [x] `AuditVerifyCommandTests.cs` created
+- [x] Integration test with synthetic evidence
+- [x] Golden output tests for generated markdown
+- [x] Tests for all archive formats
+
+### AUD-007 - Documentation
+Status: DONE
+Dependency: AUD-003, AUD-004, AUD-005
+Owners: Documentation author
+
+Task description:
+Document the audit bundle feature:
+- Command reference in `docs/modules/cli/guides/commands/audit.md`
+- Bundle format specification in `docs/modules/cli/guides/audit-bundle-format.md`
+- Auditor guide in `docs/operations/guides/auditor-guide.md`
+- Add to command reference index
+
+Completion criteria:
+- [x] Command reference documentation
+- [x] Bundle format specification
+- [x] Auditor-facing guide with screenshots/examples
+- [x] Linked from FEATURE_MATRIX.md
+
+## Execution Log
+| Date (UTC) | Update | Owner |
+| --- | --- | --- |
+| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
+| 2026-01-17 | AUD-003, AUD-004 completed. audit bundle command implemented in AuditCommandGroup.cs with all output formats, manifest generation, README, and replay instructions. | Developer |
+| 2026-01-17 | AUD-001, AUD-002, AUD-005, AUD-006, AUD-007 completed. Bundle format spec documented, IAuditBundleService + AuditBundleService implemented, AuditVerifyCommand implemented, tests added. | Developer |
+| 2026-01-17 | AUD-007 documentation completed. Command reference (audit.md), auditor guide created. | Documentation |
+| 2026-01-17 | Final verification: AuditVerifyCommandTests.cs created with archive format tests and golden output tests. All tasks DONE. Sprint ready for archive. | QA |
+
+## Decisions & Risks
+- **Decision needed:** Should bundle include raw VEX documents or normalized versions? Recommend: both (raw in `vex-statements/raw/`, normalized in `vex-statements/normalized/`).
+- **Decision needed:** What archive format should be default? Recommend: directory for local use, tar.gz for transfer.
+- **Risk:** Large bundles may be slow to generate. Mitigation: Add progress reporting and consider streaming archive creation.
+- **Risk:** Bundle format may need evolution. Mitigation: Include schema version in manifest from day one.
+
+## Next Checkpoints
+- Format specification complete: +2 working days
+- Bundle generation working: +4 working days
+- Commands and tests complete: +3 working days
+- Documentation complete: +2 working days
--- a/docs/implplan/SPRINT_20260117_028_Telemetry_p0_metrics.md
+++ b/docs/implplan/SPRINT_20260117_028_Telemetry_p0_metrics.md
@@ -0,0 +1,240 @@
+# Sprint 028 · P0 Product Metrics Definition
+
+## Topic & Scope
+- Define and instrument the four P0 product-level metrics from the AI Economics Moat advisory.
+- Create Grafana dashboard templates for tracking these metrics.
+- Enable solo-scaled operations by making product health visible at a glance.
+- Working directory: `src/Telemetry/`, `devops/telemetry/`.
+- Expected evidence: Metric definitions, instrumentation, dashboard templates, alerting rules.
+
+**Moat Reference:** M3 (Operability moat), Section 8 (Product-level metrics)
+
+**Advisory Alignment:** "These metrics are the scoreboard. Prioritize work that improves them."
+
+## Dependencies & Concurrency
+- Requires existing OpenTelemetry infrastructure (already in place).
+- Can run in parallel with other sprints.
+- Dashboard templates depend on Grafana/Prometheus stack.
+
+## Documentation Prerequisites
+- Read `docs/modules/telemetry/guides/observability.md` for existing metric patterns.
+- Read `src/Attestor/StellaOps.Attestor/StellaOps.Attestor.Core/Verification/RekorVerificationMetrics.cs` for metric implementation patterns.
+- Read advisory section 8 for metric definitions.
+
+## Delivery Tracker
+
+### P0M-001 - Time-to-First-Verified-Release Metric
+Status: DONE
+Dependency: none
+Owners: Developer/Implementer
+
+Task description:
+Instrument `stella_time_to_first_verified_release_seconds` histogram:
+
+**Definition:** Elapsed time from fresh install (first service startup) to first successful verified promotion (policy gate passed, evidence recorded).
+
+**Labels:**
+- `tenant`: Tenant identifier
+- `deployment_type`: `fresh` | `upgrade`
+
+**Collection points:**
+1. Record install timestamp on first Authority startup (store in DB)
+2. Record first verified promotion timestamp in Release Orchestrator
+3. Emit metric on first promotion with duration = promotion_time - install_time
+
+**Implementation:**
+- Add `InstallTimestampService` to record first startup
+- Add metric emission in `ReleaseOrchestrator` on first promotion per tenant
+- Use histogram buckets: 5m, 15m, 30m, 1h, 2h, 4h, 8h, 24h, 48h, 168h (1 week)
+
+Completion criteria:
+- [x] Install timestamp recorded on first startup
+- [x] Metric emitted on first verified promotion
+- [x] Histogram with appropriate buckets
+- [x] Label for tenant and deployment type
+- [x] Unit test for metric emission
+
+### P0M-002 - Mean Time to Answer "Why Blocked" Metric
+Status: DONE
+Dependency: none
+Owners: Developer/Implementer
+
+Task description:
+Instrument `stella_why_blocked_latency_seconds` histogram:
+
+**Definition:** Time from block decision to user viewing explanation (via CLI, UI, or API).
+
+**Labels:**
+- `tenant`: Tenant identifier
+- `surface`: `cli` | `ui` | `api`
+- `resolution_type`: `immediate` (same session) | `delayed` (different session)
+
+**Collection points:**
+1. Record block decision timestamp in verdict
+2. Record explanation view timestamp when `stella explain block` or UI equivalent is invoked
+3. Emit metric with duration
+
+**Implementation:**
+- Add explanation view tracking in CLI command
+- Add explanation view tracking in UI (existing telemetry hook)
+- Correlate via artifact digest
+- Use histogram buckets: 1s, 5s, 30s, 1m, 5m, 15m, 1h, 4h, 24h
+
+Completion criteria:
+- [x] Block decision timestamp available in verdict
+- [x] Explanation view events tracked
+- [x] Correlation by artifact digest
+- [x] Histogram with appropriate buckets
+- [x] Surface label populated correctly
+
+### P0M-003 - Support Minutes per Customer Metric
+Status: DONE
+Dependency: none
+Owners: Developer/Implementer
+
+Task description:
+Instrument `stella_support_burden_minutes_total` counter:
+
+**Definition:** Accumulated support time per customer per month. This is a manual/semi-automated metric for solo operations tracking.
+
+**Labels:**
+- `tenant`: Tenant identifier
+- `category`: `install` | `config` | `policy` | `integration` | `bug` | `other`
+- `month`: YYYY-MM
+
+**Collection approach:**
+Since this is primarily manual, create:
+1. CLI command `stella ops support log --tenant <id> --minutes <n> --category <cat>` for logging support events
+2. API endpoint for programmatic logging
+3. Counter incremented on each log entry
+
+**Target:** Trend toward zero. Alert if any tenant exceeds 30 minutes/month.
+
+Completion criteria:
+- [x] Metric definition in P0ProductMetrics.cs
+- [x] Counter metric with labels
+- [x] Monthly aggregation capability
+- [x] Dashboard panel showing trend
+
+### P0M-004 - Determinism Regressions Metric
+Status: DONE
+Dependency: none
+Owners: Developer/Implementer
+
+Task description:
+Instrument `stella_determinism_regressions_total` counter:
+
+**Definition:** Count of detected determinism failures in production (same inputs produced different outputs).
+
+**Labels:**
+- `tenant`: Tenant identifier
+- `component`: `scanner` | `policy` | `attestor` | `export`
+- `severity`: `bitwise` | `semantic` | `policy` (matches fidelity tiers)
+
+**Collection points:**
+1. Determinism verification jobs (scheduled)
+2. Replay verification failures
+3. Golden test CI failures (development)
+
+**Implementation:**
+- Add counter emission in `DeterminismVerifier`
+- Add counter emission in replay batch jobs
+- Use existing fidelity tier classification
+
+**Target:** Near-zero. Alert immediately on any `policy` severity regression.
+
+Completion criteria:
+- [x] Counter metric with labels
+- [x] Emission on determinism verification failure
+- [x] Severity classification (bitwise/semantic/policy)
+- [x] Unit test for metric emission
+
+### P0M-005 - Grafana Dashboard Template
+Status: DONE
+Dependency: P0M-001, P0M-002, P0M-003, P0M-004
+Owners: Developer/Implementer
+
+Task description:
+Create Grafana dashboard template `stella-ops-p0-metrics.json`:
+
+**Panels:**
+1. **Time to First Release** - Histogram heatmap + P50/P90/P99 stat
+2. **Why Blocked Latency** - Histogram heatmap + trend line
+3. **Support Burden** - Stacked bar by category, monthly trend
+4. **Determinism Regressions** - Counter with severity breakdown, alert status
+
+**Features:**
+- Tenant selector variable
+- Time range selector
+- Drill-down links to detailed dashboards
+- SLO indicator (green/yellow/red)
+
+**File location:** `devops/telemetry/grafana/dashboards/stella-ops-p0-metrics.json`
+
+Completion criteria:
+- [x] Dashboard JSON template created
+- [x] All four P0 metrics visualized
+- [x] Tenant filtering working
+- [x] SLO indicators configured
+- [x] Unit test for dashboard schema
+
+### P0M-006 - Alerting Rules
+Status: DONE
+Dependency: P0M-001, P0M-002, P0M-003, P0M-004
+Owners: Developer/Implementer
+
+Task description:
+Create Prometheus alerting rules for P0 metrics:
+
+**Rules:**
+1. `StellaTimeToFirstReleaseHigh` - P90 > 4 hours (warning), P90 > 24 hours (critical)
+2. `StellaWhyBlockedLatencyHigh` - P90 > 5 minutes (warning), P90 > 1 hour (critical)
+3. `StellaSupportBurdenHigh` - Any tenant > 30 min/month (warning), > 60 min/month (critical)
+4. `StellaDeterminismRegression` - Any policy-level regression (critical immediately)
+
+**File location:** `devops/telemetry/alerts/stella-p0-alerts.yml`
+
+Completion criteria:
+- [x] Alert rules file created
+- [x] All four metrics have alert rules
+- [x] Severity levels appropriate
+- [x] Alert annotations include runbook links
+- [x] Tested with synthetic data
+
+### P0M-007 - Documentation
+Status: DONE
+Dependency: P0M-001, P0M-002, P0M-003, P0M-004, P0M-005, P0M-006
+Owners: Documentation author
+
+Task description:
+Document the P0 metrics:
+- Add metrics to `docs/modules/telemetry/guides/p0-metrics.md`
+- Include metric definitions, labels, collection points
+- Include dashboard screenshot and usage guide
+- Include alerting thresholds and response procedures
+- Link from advisory and FEATURE_MATRIX.md
+
+Completion criteria:
+- [x] Metric definitions documented
+- [x] Dashboard usage guide
+- [x] Alert response procedures
+- [x] Linked from advisory implementation tracking
+- [x] Linked from FEATURE_MATRIX.md
+
+## Execution Log
+| Date (UTC) | Update | Owner |
+| --- | --- | --- |
+| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
+| 2026-01-17 | P0M-001 through P0M-006 completed. P0ProductMetrics.cs, InstallTimestampService.cs, Grafana dashboard, and alert rules implemented. Tests added. | Developer |
+| 2026-01-17 | P0M-007 completed. docs/modules/telemetry/guides/p0-metrics.md created with full metric documentation, dashboard guide, and alert procedures. | Documentation |
+
+## Decisions & Risks
+- **Decision needed:** For P0M-003 (support burden), should we integrate with external ticketing systems (Jira, Linear) or keep it CLI-only? Recommend: CLI-only initially, add integrations later.
+- **Decision needed:** What histogram bucket distributions are appropriate? Recommend: Start with proposed buckets, refine based on real data.
+- **Risk:** Time-to-first-release metric requires install timestamp persistence. If DB is wiped, metric resets. Mitigation: Accept this limitation; document in metric description.
+- **Risk:** Why-blocked correlation may be imperfect if user investigates via different surface than where block occurred. Mitigation: Track best-effort, note limitation in docs.
+
+## Next Checkpoints
+- Metric instrumentation complete: +3 working days
+- Dashboard template complete: +2 working days
+- Alerting rules and docs: +2 working days
--- a/docs/modules/cli/guides/audit-bundle-format.md
+++ b/docs/modules/cli/guides/audit-bundle-format.md
@@ -0,0 +1,271 @@
+# Audit Bundle Format Specification
+
+> **Sprint:** SPRINT_20260117_027_CLI_audit_bundle_command  
+> **Task:** AUD-001 - Audit Bundle Format Specification  
+> **Version:** 1.0.0
+
+## Overview
+
+The Stella Ops Audit Bundle is a self-contained, tamper-evident package containing all evidence required for an auditor to verify a release decision. The bundle is designed for:
+
+- **Completeness:** Contains everything needed to verify a verdict without additional tool invocations
+- **Reproducibility:** Includes replay instructions for deterministic re-verification
+- **Portability:** Standard formats (JSON, Markdown) readable by common tools
+- **Integrity:** Cryptographic manifest ensures tamper detection
+
+## Bundle Structure
+
+```
+audit-bundle-<digest>-<timestamp>/
+├── manifest.json              # Bundle manifest with cryptographic hashes
+├── README.md                  # Human-readable guide for auditors
+├── verdict/
+│   ├── verdict.json           # StellaVerdict artifact
+│   └── verdict.dsse.json      # DSSE envelope with signatures
+├── evidence/
+│   ├── sbom.json              # SBOM (CycloneDX format)
+│   ├── vex-statements/        # All VEX statements considered
+│   │   ├── index.json         # VEX index with sources
+│   │   └── *.json             # Individual VEX documents
+│   ├── reachability/
+│   │   ├── analysis.json      # Reachability analysis result
+│   │   └── call-graph.dot     # Call graph visualization (optional)
+│   └── provenance/
+│       └── slsa-provenance.json
+├── policy/
+│   ├── policy-snapshot.json   # Policy version and rules used
+│   ├── gate-decision.json     # Gate evaluation result
+│   └── evaluation-trace.json  # Full policy trace (optional)
+├── replay/
+│   ├── knowledge-snapshot.json  # Frozen inputs for replay
+│   └── replay-instructions.md   # How to replay verdict
+└── schema/                    # Schema references (optional)
+    ├── verdict-schema.json
+    └── vex-schema.json
+```
+
+## File Specifications
+
+### manifest.json
+
+The manifest provides cryptographic integrity and bundle metadata.
+
+```json
+{
+  "$schema": "https://schema.stella-ops.org/audit-bundle/manifest/v1",
+  "version": "1.0.0",
+  "bundleId": "urn:stella:audit-bundle:sha256:abc123...",
+  "artifactDigest": "sha256:abc123...",
+  "generatedAt": "2026-01-17T10:30:00Z",
+  "generatedBy": "stella-cli/2.5.0",
+  "files": [
+    {
+      "path": "verdict/verdict.json",
+      "sha256": "abc123...",
+      "size": 12345,
+      "required": true
+    },
+    {
+      "path": "evidence/sbom.json",
+      "sha256": "def456...",
+      "size": 98765,
+      "required": true
+    }
+  ],
+  "totalFiles": 12,
+  "totalSize": 234567,
+  "integrityHash": "sha256:manifest-hash-of-all-file-hashes"
+}
+```
+
+### README.md
+
+Auto-generated guide for auditors with:
+- Bundle overview and artifact identification
+- Quick verification steps
+- File inventory with descriptions
+- Contact information for questions
+
+### verdict/verdict.json
+
+The StellaVerdict artifact in standard format:
+
+```json
+{
+  "$schema": "https://schema.stella-ops.org/verdict/v1",
+  "artifactDigest": "sha256:abc123...",
+  "artifactType": "container-image",
+  "decision": "BLOCKED",
+  "timestamp": "2026-01-17T10:25:00Z",
+  "gates": [
+    {
+      "gateId": "vex-trust",
+      "status": "BLOCKED",
+      "reason": "Trust score below threshold (0.45 < 0.70)",
+      "evidenceRefs": ["evidence/vex-statements/vendor-x.json"]
+    }
+  ],
+  "contentId": "urn:stella:verdict:sha256:xyz..."
+}
+```
+
+### verdict/verdict.dsse.json
+
+DSSE (Dead Simple Signing Envelope) containing the signed verdict:
+
+```json
+{
+  "payloadType": "application/vnd.stella-ops.verdict+json",
+  "payload": "base64-encoded-verdict",
+  "signatures": [
+    {
+      "keyid": "urn:stella:key:sha256:...",
+      "sig": "base64-signature"
+    }
+  ]
+}
+```
+
+### evidence/sbom.json
+
+CycloneDX SBOM in JSON format (or SPDX if configured).
+
+### evidence/vex-statements/
+
+Directory containing all VEX statements considered during evaluation:
+
+- `index.json` - Index of VEX statements with metadata
+- Individual VEX documents named by source and ID
+
+### evidence/reachability/analysis.json
+
+Reachability analysis results:
+
+```json
+{
+  "artifactDigest": "sha256:abc123...",
+  "analysisType": "static",
+  "analysisTimestamp": "2026-01-17T10:20:00Z",
+  "components": [
+    {
+      "purl": "pkg:npm/lodash@4.17.21",
+      "vulnerabilities": [
+        {
+          "id": "CVE-2021-23337",
+          "reachable": false,
+          "reason": "Vulnerable function not in call graph"
+        }
+      ]
+    }
+  ]
+}
+```
+
+### policy/policy-snapshot.json
+
+Snapshot of policy configuration at evaluation time:
+
+```json
+{
+  "policyVersion": "v2.3.1",
+  "policyDigest": "sha256:policy-hash...",
+  "gates": ["sbom-required", "vex-trust", "cve-threshold"],
+  "thresholds": {
+    "vexTrustScore": 0.70,
+    "maxCriticalCves": 0,
+    "maxHighCves": 5
+  },
+  "evaluatedAt": "2026-01-17T10:25:00Z"
+}
+```
+
+### policy/gate-decision.json
+
+Detailed gate evaluation result:
+
+```json
+{
+  "artifactDigest": "sha256:abc123...",
+  "overallDecision": "BLOCKED",
+  "gates": [
+    {
+      "gateId": "vex-trust",
+      "decision": "BLOCKED",
+      "inputs": {
+        "vexStatements": 3,
+        "trustScore": 0.45,
+        "threshold": 0.70
+      },
+      "reason": "Trust score below threshold",
+      "suggestion": "Obtain VEX from trusted issuer or adjust trust registry"
+    }
+  ]
+}
+```
+
+### replay/knowledge-snapshot.json
+
+Frozen inputs for deterministic replay:
+
+```json
+{
+  "$schema": "https://schema.stella-ops.org/knowledge-snapshot/v1",
+  "snapshotId": "urn:stella:snapshot:sha256:...",
+  "capturedAt": "2026-01-17T10:25:00Z",
+  "inputs": {
+    "sbomDigest": "sha256:sbom-hash...",
+    "vexStatements": ["sha256:vex1...", "sha256:vex2..."],
+    "policyDigest": "sha256:policy-hash...",
+    "reachabilityDigest": "sha256:reach-hash..."
+  },
+  "replayCommand": "stella replay snapshot --manifest replay/knowledge-snapshot.json"
+}
+```
+
+### replay/replay-instructions.md
+
+Human-readable replay instructions (auto-generated, see AUD-004).
+
+## Archive Formats
+
+The bundle can be output in three formats:
+
+| Format | Extension | Use Case |
+|--------|-----------|----------|
+| Directory | (none) | Local inspection, development |
+| tar.gz | `.tar.gz` | Transfer, archival (default for remote) |
+| zip | `.zip` | Windows compatibility |
+
+## Verification
+
+To verify a bundle's integrity:
+
+```bash
+stella audit verify ./audit-bundle-sha256-abc123/
+```
+
+Verification checks:
+1. Parse `manifest.json`
+2. Verify each file's SHA-256 hash matches manifest
+3. Verify `integrityHash` (hash of all file hashes)
+4. Optionally verify DSSE signatures
+
+## Compliance Mapping
+
+| Compliance Framework | Bundle Component |
+|---------------------|------------------|
+| SOC 2 (CC7.1) | verdict/, policy/ |
+| ISO 27001 (A.12.6) | evidence/sbom.json |
+| FedRAMP | All components |
+| SLSA Level 3 | evidence/provenance/ |
+
+## Extensibility
+
+Custom evidence can be added to `evidence/custom/` directory. Custom files must be:
+- Listed in `manifest.json`
+- JSON or Markdown format
+- Include schema reference if JSON
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/modules/cli/guides/commands/audit.md
+++ b/docs/modules/cli/guides/commands/audit.md
@@ -0,0 +1,251 @@
+# stella audit
+
+> **Sprint:** SPRINT_20260117_027_CLI_audit_bundle_command  
+> **Task:** AUD-007 - Documentation
+
+Commands for audit operations including bundle generation and verification.
+
+## Synopsis
+
+```
+stella audit <command> [options]
+```
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `bundle` | Generate self-contained audit bundle for an artifact |
+| `verify` | Verify audit bundle integrity |
+
+---
+
+## stella audit bundle
+
+Generate a self-contained, auditor-ready evidence package for an artifact.
+
+### Synopsis
+
+```
+stella audit bundle <digest> [options]
+```
+
+### Arguments
+
+| Argument | Description |
+|----------|-------------|
+| `<digest>` | Artifact digest (e.g., `sha256:abc123...`) |
+
+### Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--output <path>` | `./audit-bundle-<digest>/` | Output path for the bundle |
+| `--format <format>` | `dir` | Output format: `dir`, `tar.gz`, `zip` |
+| `--include-call-graph` | `false` | Include call graph visualization |
+| `--include-schemas` | `false` | Include JSON schema files |
+| `--include-trace` | `true` | Include policy evaluation trace |
+| `--policy-version <ver>` | (current) | Use specific policy version |
+| `--overwrite` | `false` | Overwrite existing output |
+| `--verbose` | `false` | Show progress during generation |
+
+### Examples
+
+```bash
+# Generate bundle as directory
+stella audit bundle sha256:abc123def456
+
+# Generate tar.gz archive
+stella audit bundle sha256:abc123def456 --format tar.gz
+
+# Specify output location
+stella audit bundle sha256:abc123def456 --output ./audits/release-v2.5/
+
+# Include all optional content
+stella audit bundle sha256:abc123def456 \
+  --include-call-graph \
+  --include-schemas \
+  --verbose
+
+# Use specific policy version
+stella audit bundle sha256:abc123def456 --policy-version v2.3.1
+```
+
+### Output
+
+The bundle contains:
+
+```
+audit-bundle-<digest>-<timestamp>/
+├── manifest.json              # Bundle manifest with cryptographic hashes
+├── README.md                  # Human-readable guide for auditors
+├── verdict/
+│   ├── verdict.json           # StellaVerdict artifact
+│   └── verdict.dsse.json      # DSSE envelope with signatures
+├── evidence/
+│   ├── sbom.json              # SBOM (CycloneDX format)
+│   ├── vex-statements/        # All VEX statements considered
+│   │   ├── index.json
+│   │   └── *.json
+│   ├── reachability/
+│   │   ├── analysis.json
+│   │   └── call-graph.dot     # Optional
+│   └── provenance/
+│       └── slsa-provenance.json
+├── policy/
+│   ├── policy-snapshot.json
+│   ├── gate-decision.json
+│   └── evaluation-trace.json
+├── replay/
+│   ├── knowledge-snapshot.json
+│   └── replay-instructions.md
+└── schema/                    # Optional
+    ├── verdict-schema.json
+    └── vex-schema.json
+```
+
+### Exit Codes
+
+| Code | Description |
+|------|-------------|
+| 0 | Bundle generated successfully |
+| 1 | Bundle generated with missing evidence (warnings) |
+| 2 | Error (artifact not found, permission denied, etc.) |
+
+---
+
+## stella audit verify
+
+Verify the integrity of an audit bundle.
+
+### Synopsis
+
+```
+stella audit verify <bundle-path> [options]
+```
+
+### Arguments
+
+| Argument | Description |
+|----------|-------------|
+| `<bundle-path>` | Path to audit bundle (directory or archive) |
+
+### Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--strict` | `false` | Fail on any missing optional files |
+| `--check-signatures` | `false` | Verify DSSE signatures |
+| `--trusted-keys <path>` | (none) | Path to trusted keys file for signature verification |
+
+### Examples
+
+```bash
+# Basic verification
+stella audit verify ./audit-bundle-abc123-20260117/
+
+# Strict mode (fail on any missing files)
+stella audit verify ./audit-bundle-abc123-20260117/ --strict
+
+# Verify signatures
+stella audit verify ./audit-bundle.tar.gz \
+  --check-signatures \
+  --trusted-keys ./trusted-keys.json
+
+# Verify archive directly
+stella audit verify ./audit-bundle-abc123.zip
+```
+
+### Output
+
+```
+Verifying bundle: ./audit-bundle-abc123-20260117/
+
+Bundle ID: urn:stella:audit-bundle:sha256:abc123...
+Artifact: sha256:abc123def456...
+Generated: 2026-01-17T10:30:00Z
+Files: 15
+
+Verifying files...
+✓ Verified 15/15 files
+✓ Integrity hash verified
+
+✓ Bundle integrity verified
+```
+
+### Exit Codes
+
+| Code | Description |
+|------|-------------|
+| 0 | Bundle is valid |
+| 1 | Bundle integrity check failed |
+| 2 | Error (bundle not found, invalid format, etc.) |
+
+---
+
+## Trusted Keys File Format
+
+For signature verification, provide a JSON file with trusted public keys:
+
+```json
+{
+  "keys": [
+    {
+      "keyId": "urn:stella:key:sha256:abc123...",
+      "publicKey": "-----BEGIN PUBLIC KEY-----\n...\n-----END PUBLIC KEY-----"
+    }
+  ]
+}
+```
+
+---
+
+## Use Cases
+
+### Generating Bundles for External Auditors
+
+```bash
+# Generate comprehensive bundle for SOC 2 audit
+stella audit bundle sha256:prod-release-v2.5 \
+  --format zip \
+  --include-schemas \
+  --output ./soc2-audit-2026/release-evidence.zip
+```
+
+### Verifying Received Bundles
+
+```bash
+# Verify bundle received from another team
+stella audit verify ./received-bundle.tar.gz --strict
+
+# Verify with signature checking
+stella audit verify ./received-bundle/ \
+  --check-signatures \
+  --trusted-keys ./company-signing-keys.json
+```
+
+### CI/CD Integration
+
+```yaml
+# GitLab CI example
+audit-bundle:
+  stage: release
+  script:
+    - stella audit bundle $IMAGE_DIGEST --format tar.gz --output ./audit/
+  artifacts:
+    paths:
+      - audit/
+    expire_in: 5 years
+```
+
+---
+
+## Related
+
+- [Audit Bundle Format Specification](audit-bundle-format.md)
+- [stella replay](../replay.md) - Replay verdicts for verification
+- [stella export](export.md) - Export evidence in various formats
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/modules/cli/guides/commands/explain.md
+++ b/docs/modules/cli/guides/commands/explain.md
@@ -0,0 +1,313 @@
+# stella explain - Block Explanation Commands
+
+**Sprint:** SPRINT_20260117_026_CLI_why_blocked_command
+
+## Overview
+
+The `stella explain` command group provides commands for understanding why artifacts are blocked by policy gates. This addresses the M2 moat requirement: **"Explainability with proof, not narrative."**
+
+When an artifact is blocked, `stella explain` produces a **deterministic trace** with **referenced evidence artifacts**, enabling:
+- Clear understanding of which gate blocked the artifact
+- Actionable suggestions for remediation
+- Verifiable evidence chain
+- Deterministic replay for verification
+
+---
+
+## Commands
+
+### stella explain block
+
+Explain why an artifact was blocked by policy gates.
+
+**Usage:**
+```bash
+stella explain block <digest> [options]
+```
+
+**Arguments:**
+- `<digest>` - Artifact digest in any of these formats:
+  - `sha256:abc123...` - Full digest with algorithm prefix
+  - `abc123...` - Raw 64-character hex digest (assumed sha256)
+  - `registry.example.com/image@sha256:abc123...` - OCI reference (digest extracted)
+
+**Options:**
+
+| Option | Alias | Description | Default |
+|--------|-------|-------------|---------|
+| `--format <format>` | `-f` | Output format: `table`, `json`, `markdown` | `table` |
+| `--show-evidence` | `-e` | Include full evidence artifact details | false |
+| `--show-trace` | `-t` | Include policy evaluation trace | false |
+| `--replay-token` | `-r` | Include replay token in output | false |
+| `--output <path>` | `-o` | Write to file instead of stdout | stdout |
+| `--offline` | | Query local verdict cache only | false |
+
+---
+
+## Output Formats
+
+### Table Format (Default)
+
+Human-readable format optimized for terminal display:
+
+```
+Artifact: sha256:abc123def456789012345678901234567890123456789012345678901234
+Status: BLOCKED
+
+Gate: VexTrust
+Reason: Trust score below threshold (0.45 < 0.70)
+Suggestion: Obtain VEX statement from trusted issuer or add issuer to trust registry
+
+Evidence:
+  [VEX   ] vex:sha256:de...23  vendor-x      2026-01-15T10:00:00Z
+  [REACH ] reach:sha256...56   static        2026-01-15T09:55:00Z
+
+Replay: stella verify verdict --verdict urn:stella:verdict:sha256:abc123:v2.3.0:1737108000
+```
+
+### JSON Format
+
+Machine-readable format for CI/CD integration:
+
+```json
+{
+  "artifact": "sha256:abc123def456789012345678901234567890123456789012345678901234",
+  "status": "BLOCKED",
+  "gate": "VexTrust",
+  "reason": "Trust score below threshold (0.45 < 0.70)",
+  "suggestion": "Obtain VEX statement from trusted issuer or add issuer to trust registry",
+  "evaluationTime": "2026-01-15T10:30:00+00:00",
+  "policyVersion": "v2.3.0",
+  "evidence": [
+    {
+      "type": "VEX",
+      "id": "vex:sha256:def456789abc123",
+      "source": "vendor-x",
+      "timestamp": "2026-01-15T10:00:00+00:00",
+      "retrieveCommand": "stella evidence get vex:sha256:def456789abc123"
+    },
+    {
+      "type": "REACH",
+      "id": "reach:sha256:789abc123def456",
+      "source": "static-analysis",
+      "timestamp": "2026-01-15T09:55:00+00:00",
+      "retrieveCommand": "stella evidence get reach:sha256:789abc123def456"
+    }
+  ],
+  "replayCommand": "stella verify verdict --verdict urn:stella:verdict:sha256:abc123:v2.3.0:1737108000"
+}
+```
+
+### Markdown Format
+
+Suitable for embedding in GitHub issues, PR comments, or documentation:
+
+```markdown
+## Block Explanation
+
+**Artifact:** `sha256:abc123def456789012345678901234567890123456789012345678901234`
+**Status:** BLOCKED
+
+### Gate Decision
+
+| Property | Value |
+|----------|-------|
+| Gate | VexTrust |
+| Reason | Trust score below threshold (0.45 < 0.70) |
+| Suggestion | Obtain VEX statement from trusted issuer or add issuer to trust registry |
+| Policy Version | v2.3.0 |
+
+### Evidence
+
+| Type | ID | Source | Timestamp |
+|------|-----|--------|-----------|
+| VEX | `vex:sha256:de...23` | vendor-x | 2026-01-15 10:00 |
+| REACH | `reach:sha256...56` | static-analysis | 2026-01-15 09:55 |
+
+### Verification
+
+```bash
+stella verify verdict --verdict urn:stella:verdict:sha256:abc123:v2.3.0:1737108000
+```
+```
+
+---
+
+## Examples
+
+### Basic Block Explanation
+
+```bash
+# Get basic explanation of why an artifact is blocked
+stella explain block sha256:abc123def456789012345678901234567890123456789012345678901234
+```
+
+### JSON Output for CI/CD
+
+```bash
+# Get JSON output for parsing in CI/CD pipeline
+stella explain block sha256:abc123... --format json --output block-reason.json
+
+# Parse in CI/CD
+GATE=$(jq -r '.gate' block-reason.json)
+REASON=$(jq -r '.reason' block-reason.json)
+echo "Blocked by $GATE: $REASON"
+```
+
+### Full Explanation with Evidence and Trace
+
+```bash
+# Get complete explanation with all details
+stella explain block sha256:abc123... \
+  --show-evidence \
+  --show-trace \
+  --replay-token \
+  --format table
+```
+
+### Markdown for PR Comment
+
+```bash
+# Generate markdown for GitHub PR comment
+stella explain block sha256:abc123... --format markdown --output comment.md
+
+# Use with gh CLI
+gh pr comment 123 --body-file comment.md
+```
+
+### Retrieve Evidence Artifacts
+
+```bash
+# Get explanation
+stella explain block sha256:abc123... --show-evidence
+
+# Retrieve specific evidence artifacts
+stella evidence get vex:sha256:def456789abc123
+stella evidence get reach:sha256:789abc123def456
+```
+
+### Verify Deterministic Replay
+
+```bash
+# Get replay token
+REPLAY=$(stella explain block sha256:abc123... --format json | jq -r '.replayCommand')
+
+# Execute replay verification
+eval $REPLAY
+```
+
+---
+
+## Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| `0` | Artifact is NOT blocked (all gates passed) |
+| `1` | Artifact IS blocked (one or more gates failed) |
+| `2` | Error (artifact not found, API error, etc.) |
+
+**CI/CD Integration:**
+
+```bash
+# Fail pipeline if artifact is blocked
+if ! stella explain block sha256:abc123... --format json > /dev/null 2>&1; then
+  EXIT_CODE=$?
+  if [ $EXIT_CODE -eq 1 ]; then
+    echo "ERROR: Artifact is blocked by policy"
+    stella explain block sha256:abc123... --format markdown
+    exit 1
+  else
+    echo "ERROR: Could not retrieve block status"
+    exit 2
+  fi
+fi
+```
+
+---
+
+## Evidence Types
+
+The `explain block` command returns evidence artifacts that contributed to the gate decision:
+
+| Type | Description | Source |
+|------|-------------|--------|
+| `VEX` | VEX (Vulnerability Exploitability eXchange) statement | VEX issuers, vendor security teams |
+| `REACH` | Reachability analysis result | Static analysis, call graph analysis |
+| `SBOM` | Software Bill of Materials | SBOM generators, build systems |
+| `SCAN` | Vulnerability scan result | Scanner service |
+| `ATTEST` | Attestation document | Attestor service, SLSA provenance |
+| `POLICY` | Policy evaluation result | Policy engine |
+
+---
+
+## Determinism Guarantee
+
+All output from `stella explain block` is **deterministic**:
+
+1. **Same inputs produce identical outputs** - Given the same artifact digest and policy version, the output is byte-for-byte identical
+2. **Evidence is sorted** - Evidence artifacts are sorted by timestamp (ascending)
+3. **Trace is sorted** - Evaluation trace steps are sorted by step number
+4. **Timestamps use ISO 8601** - All timestamps use ISO 8601 format with UTC offset
+5. **JSON uses canonical ordering** - JSON properties are ordered consistently
+
+This enables:
+- **Replay verification** - Use the replay token to verify the decision can be reproduced
+- **Audit trails** - Compare explanations across time
+- **Cache validation** - Verify cached decisions match current evaluation
+
+---
+
+## Troubleshooting
+
+### Artifact Not Found
+
+```
+Error: Artifact sha256:abc123... not found in registry or evidence store.
+```
+
+**Causes:**
+- Artifact was never scanned
+- Artifact digest is incorrect
+- Artifact was deleted from registry
+
+**Solutions:**
+```bash
+# Verify artifact exists
+stella image inspect sha256:abc123...
+
+# Scan the artifact
+stella scan docker://myregistry/myimage@sha256:abc123...
+```
+
+### Not Blocked
+
+```
+Artifact sha256:abc123... is NOT blocked. All policy gates passed.
+```
+
+This means the artifact passed all policy evaluations. Exit code will be `0`.
+
+### API Error
+
+```
+Error: Policy service unavailable
+```
+
+**Solutions:**
+```bash
+# Check connectivity
+stella doctor --check check.policy.connectivity
+
+# Use offline mode if available
+stella explain block sha256:abc123... --offline
+```
+
+---
+
+## See Also
+
+- [Policy Commands](policy.md) - Policy management and testing
+- [VEX Commands](vex.md) - VEX document management
+- [Evidence Commands](evidence.md) - Evidence retrieval and verification
+- [Verify Commands](verify.md) - Verdict verification and replay
+- [Command Reference](reference.md) - Complete command reference
--- a/docs/modules/cli/guides/commands/reference.md
+++ b/docs/modules/cli/guides/commands/reference.md
@@ -13,6 +13,7 @@ graph TD
    CLI --> ADMIN[Administration]
    CLI --> AUTH[Authentication]
    CLI --> POLICY[Policy Management]
+    CLI --> EXPLAIN[Explainability]
    CLI --> VEX[VEX & Decisioning]
    CLI --> SBOM[SBOM Operations]
    CLI --> REPORT[Reporting & Export]
@@ -914,6 +915,73 @@ Platform: linux-x64

 ---

+## Explainability Commands
+
+### stella explain block
+
+Explain why an artifact was blocked by policy gates. Produces deterministic trace with referenced evidence artifacts.
+
+**Sprint:** SPRINT_20260117_026_CLI_why_blocked_command
+**Moat Reference:** M2 (Explainability with proof, not narrative)
+
+**Usage:**
+```bash
+stella explain block <digest> [options]
+```
+
+**Arguments:**
+- `<digest>` - Artifact digest (`sha256:abc123...`, raw hex, or OCI reference)
+
+**Options:**
+| Option | Description | Default |
+|--------|-------------|---------|
+| `--format <format>` | Output format: `table`, `json`, `markdown` | `table` |
+| `--show-evidence` | Include full evidence artifact details | false |
+| `--show-trace` | Include policy evaluation trace | false |
+| `--replay-token` | Include replay token in output | false |
+| `--output <path>` | Write to file instead of stdout | stdout |
+| `--offline` | Query local verdict cache only | false |
+
+**Examples:**
+```bash
+# Basic explanation
+stella explain block sha256:abc123def456...
+
+# JSON output for CI/CD
+stella explain block sha256:abc123... --format json --output reason.json
+
+# Full explanation with evidence and trace
+stella explain block sha256:abc123... --show-evidence --show-trace
+
+# Markdown for PR comment
+stella explain block sha256:abc123... --format markdown | gh pr comment 123 --body-file -
+```
+
+**Exit Codes:**
+- `0` - Artifact is NOT blocked (all gates passed)
+- `1` - Artifact IS blocked
+- `2` - Error (not found, API error)
+
+**Output (table):**
+```
+Artifact: sha256:abc123def456789012345678901234567890123456789012345678901234
+Status: BLOCKED
+
+Gate: VexTrust
+Reason: Trust score below threshold (0.45 < 0.70)
+Suggestion: Obtain VEX statement from trusted issuer
+
+Evidence:
+  [VEX   ] vex:sha256:de...23  vendor-x  2026-01-15T10:00:00Z
+  [REACH ] reach:sha256...56   static    2026-01-15T09:55:00Z
+
+Replay: stella verify verdict --verdict urn:stella:verdict:sha256:abc123:v2.3.0:1737108000
+```
+
+**See Also:** [Explain Commands Documentation](explain.md)
+
+---
+
 ## Additional Commands

 ### stella vuln query
--- a/docs/modules/telemetry/guides/p0-metrics.md
+++ b/docs/modules/telemetry/guides/p0-metrics.md
@@ -0,0 +1,333 @@
+# P0 Product Metrics
+
+> **Sprint:** SPRINT_20260117_028_Telemetry_p0_metrics  
+> **Task:** P0M-007 - Documentation
+
+This document describes the four P0 (highest priority) product-level metrics for tracking Stella Ops operational health.
+
+## Overview
+
+These metrics serve as the primary scoreboard for product health and should guide prioritization decisions. Per the AI Economics Moat advisory: "Prioritize work that improves them."
+
+| Metric | Target | Alert Threshold |
+|--------|--------|-----------------|
+| Time to First Verified Release | P90 < 4 hours | P90 > 24 hours |
+| Mean Time to Answer "Why Blocked" | P90 < 5 minutes | P90 > 1 hour |
+| Support Minutes per Customer | Trend toward 0 | > 30 min/month |
+| Determinism Regressions | Zero | Any policy-level |
+
+---
+
+## Metric 1: Time to First Verified Release
+
+**Name:** `stella_time_to_first_verified_release_seconds`  
+**Type:** Histogram
+
+### Definition
+
+Elapsed time from fresh install (first service startup) to first successful verified promotion (policy gate passed, evidence recorded).
+
+### Labels
+
+| Label | Values | Description |
+|-------|--------|-------------|
+| `tenant` | (varies) | Tenant identifier |
+| `deployment_type` | `fresh`, `upgrade` | Type of installation |
+
+### Histogram Buckets
+
+5m, 15m, 30m, 1h, 2h, 4h, 8h, 24h, 48h, 168h (1 week)
+
+### Collection Points
+
+1. **Install timestamp** - Recorded on first Authority service startup
+2. **First promotion** - Recorded in Release Orchestrator on first verified promotion
+
+### Why This Matters
+
+A short time-to-first-release indicates:
+- Good onboarding experience
+- Clear documentation
+- Sensible default configurations
+- Working integrations
+
+### Dashboard Usage
+
+The Grafana dashboard shows:
+- Histogram heatmap of time distribution
+- P50/P90/P99 statistics
+- Trend over time
+
+### Alert Response
+
+**Warning (P90 > 4 hours):**
+1. Review recent onboarding experiences
+2. Check for common configuration issues
+3. Review documentation clarity
+
+**Critical (P90 > 24 hours):**
+1. Investigate blocked customers
+2. Check for integration failures
+3. Consider guided onboarding assistance
+
+---
+
+## Metric 2: Mean Time to Answer "Why Blocked"
+
+**Name:** `stella_why_blocked_latency_seconds`  
+**Type:** Histogram
+
+### Definition
+
+Time from block decision to user viewing explanation (via CLI, UI, or API).
+
+### Labels
+
+| Label | Values | Description |
+|-------|--------|-------------|
+| `tenant` | (varies) | Tenant identifier |
+| `surface` | `cli`, `ui`, `api` | Interface used to view explanation |
+| `resolution_type` | `immediate`, `delayed` | Same session vs different session |
+
+### Histogram Buckets
+
+1s, 5s, 30s, 1m, 5m, 15m, 1h, 4h, 24h
+
+### Collection Points
+
+1. **Block decision** - Timestamp stored in verdict
+2. **Explanation view** - Tracked when `stella explain block` or UI equivalent invoked
+
+### Why This Matters
+
+Short "why blocked" latency indicates:
+- Clear block messaging
+- Discoverable explanation tools
+- Good explainability UX
+
+Long latency may indicate:
+- Users confused about where to find answers
+- Documentation gaps
+- UX friction
+
+### Dashboard Usage
+
+The Grafana dashboard shows:
+- Histogram heatmap of latency distribution
+- Trend line over time
+- Breakdown by surface (CLI vs UI vs API)
+
+### Alert Response
+
+**Warning (P90 > 5 minutes):**
+1. Review block notification messaging
+2. Check CLI command discoverability
+3. Verify UI links are prominent
+
+**Critical (P90 > 1 hour):**
+1. Investigate user flows
+2. Add proactive notifications
+3. Review documentation and help text
+
+---
+
+## Metric 3: Support Minutes per Customer
+
+**Name:** `stella_support_burden_minutes_total`  
+**Type:** Counter
+
+### Definition
+
+Accumulated support time per customer per month. This is a manual/semi-automated metric for solo operations tracking.
+
+### Labels
+
+| Label | Values | Description |
+|-------|--------|-------------|
+| `tenant` | (varies) | Tenant identifier |
+| `category` | `install`, `config`, `policy`, `integration`, `bug`, `other` | Support category |
+| `month` | YYYY-MM | Month of support |
+
+### Collection
+
+Log support interactions using:
+
+```bash
+stella ops support log --tenant <id> --minutes <n> --category <cat>
+```
+
+Or via API:
+
+```bash
+POST /v1/ops/support/log
+{
+  "tenant": "acme-corp",
+  "minutes": 15,
+  "category": "config"
+}
+```
+
+### Why This Matters
+
+This metric tracks operational scalability. For solo-scaled operations:
+- Support burden should trend toward zero
+- High support minutes indicate product gaps
+- Categories identify areas needing improvement
+
+### Dashboard Usage
+
+The Grafana dashboard shows:
+- Stacked bar chart by category
+- Monthly trend per tenant
+- Total support burden
+
+### Alert Response
+
+**Warning (> 30 min/month per tenant):**
+1. Review support interactions for patterns
+2. Identify documentation gaps
+3. Create runbooks for common issues
+
+**Critical (> 60 min/month per tenant):**
+1. Escalate to product for feature work
+2. Consider dedicated support time
+3. Prioritize automation
+
+---
+
+## Metric 4: Determinism Regressions
+
+**Name:** `stella_determinism_regressions_total`  
+**Type:** Counter
+
+### Definition
+
+Count of detected determinism failures in production (same inputs produced different outputs).
+
+### Labels
+
+| Label | Values | Description |
+|-------|--------|-------------|
+| `tenant` | (varies) | Tenant identifier |
+| `component` | `scanner`, `policy`, `attestor`, `export` | Component with regression |
+| `severity` | `bitwise`, `semantic`, `policy` | Fidelity tier of regression |
+
+### Severity Tiers
+
+| Tier | Description | Impact |
+|------|-------------|--------|
+| `bitwise` | Byte-for-byte output differs | Low - cosmetic |
+| `semantic` | Output semantically differs | Medium - potential confusion |
+| `policy` | Policy decision differs | **Critical** - audit risk |
+
+### Collection Points
+
+1. **Scheduled verification jobs** - Regular determinism checks
+2. **Replay verification failures** - User-initiated replays
+3. **CI golden test failures** - Development-time detection
+
+### Why This Matters
+
+Determinism is a core moat. Regressions indicate:
+- Non-deterministic code introduced
+- External dependency changes
+- Time-sensitive logic bugs
+
+**Policy-level regressions are audit-breaking** and must be fixed immediately.
+
+### Dashboard Usage
+
+The Grafana dashboard shows:
+- Counter with severity breakdown
+- Alert status indicator
+- Historical trend
+
+### Alert Response
+
+**Warning (any bitwise/semantic):**
+1. Review recent deployments
+2. Check for dependency updates
+3. Investigate affected component
+
+**Critical (any policy):**
+1. **Immediate investigation required**
+2. Consider rollback
+3. Review all recent policy decisions
+4. Notify affected customers
+
+---
+
+## Dashboard Access
+
+The P0 metrics dashboard is available at:
+
+```
+/grafana/d/stella-p0-metrics
+```
+
+Or directly:
+```bash
+stella ops dashboard p0
+```
+
+### Dashboard Features
+
+- **Tenant selector** - Filter by specific tenant
+- **Time range** - Adjust analysis window
+- **SLO indicators** - Green/yellow/red status
+- **Drill-down links** - Navigate to detailed views
+
+---
+
+## Alerting Configuration
+
+Alerts are configured in `devops/telemetry/alerts/stella-p0-alerts.yml`.
+
+### Alert Channels
+
+Configure alert destinations in Grafana:
+- Slack/Teams for warnings
+- PagerDuty for critical alerts
+- Email for summaries
+
+### Silencing Alerts
+
+During maintenance windows:
+```bash
+stella ops alerts silence --duration 2h --reason "Planned maintenance"
+```
+
+---
+
+## Implementation Notes
+
+### Source Files
+
+| Component | Location |
+|-----------|----------|
+| Metric definitions | `src/Telemetry/StellaOps.Telemetry.Core/P0ProductMetrics.cs` |
+| Install timestamp | `src/Telemetry/StellaOps.Telemetry.Core/InstallTimestampService.cs` |
+| Dashboard template | `devops/telemetry/grafana/dashboards/stella-ops-p0-metrics.json` |
+| Alert rules | `devops/telemetry/alerts/stella-p0-alerts.yml` |
+
+### Adding Custom Metrics
+
+To add additional P0-level metrics:
+
+1. Define in `P0ProductMetrics.cs`
+2. Add collection points in relevant services
+3. Create dashboard panel in Grafana JSON
+4. Add alert rules
+5. Update this documentation
+
+---
+
+## Related
+
+- [Observability Guide](observability.md)
+- [Alerting Configuration](alerting.md)
+- [Runbook: Metric Collection Issues](../../operations/runbooks/telemetry-metrics-ops.md)
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/guides/auditor-guide.md
+++ b/docs/operations/guides/auditor-guide.md
@@ -0,0 +1,256 @@
+# Auditor Guide
+
+> **Sprint:** SPRINT_20260117_027_CLI_audit_bundle_command  
+> **Task:** AUD-007 - Documentation
+
+This guide is for external auditors reviewing Stella Ops release evidence.
+
+## Overview
+
+Stella Ops generates comprehensive, tamper-evident audit bundles that contain all evidence required to verify release decisions. This guide explains how to interpret and verify these bundles.
+
+## Receiving an Audit Bundle
+
+Audit bundles may be delivered as:
+- **Directory:** A folder containing all evidence files
+- **Archive:** A `.tar.gz` or `.zip` file
+
+### Extracting Archives
+
+```bash
+# tar.gz
+tar -xzf audit-bundle-sha256-abc123.tar.gz
+
+# zip
+unzip audit-bundle-sha256-abc123.zip
+```
+
+## Bundle Structure
+
+```
+audit-bundle-<digest>-<timestamp>/
+├── manifest.json              # Integrity manifest
+├── README.md                  # Quick reference
+├── verdict/                   # Release decision
+├── evidence/                  # Supporting evidence
+├── policy/                    # Policy configuration
+└── replay/                    # Verification instructions
+```
+
+## Step 1: Verify Bundle Integrity
+
+Before reviewing contents, verify the bundle has not been tampered with.
+
+### Using Stella CLI
+
+```bash
+stella audit verify ./audit-bundle-sha256-abc123/
+```
+
+Expected output:
+```
+✓ Verified 15/15 files
+✓ Integrity hash verified
+✓ Bundle integrity verified
+```
+
+### Manual Verification
+
+1. Open `manifest.json`
+2. For each file listed, compute SHA-256 and compare:
+   ```bash
+   sha256sum verdict/verdict.json
+   ```
+3. Verify the `integrityHash` by hashing all file hashes
+
+## Step 2: Review the Verdict
+
+The verdict is the official release decision.
+
+### verdict/verdict.json
+
+```json
+{
+  "artifactDigest": "sha256:abc123...",
+  "decision": "PASS",
+  "timestamp": "2026-01-17T10:25:00Z",
+  "gates": [
+    {
+      "gateId": "sbom-required",
+      "status": "PASS",
+      "reason": "Valid CycloneDX SBOM present"
+    },
+    {
+      "gateId": "vex-trust",
+      "status": "PASS", 
+      "reason": "Trust score 0.85 >= 0.70 threshold"
+    }
+  ]
+}
+```
+
+### Decision Values
+
+| Decision | Meaning |
+|----------|---------|
+| `PASS` | All gates passed, artifact approved for deployment |
+| `BLOCKED` | One or more gates failed, artifact not approved |
+| `PENDING` | Evaluation incomplete, awaiting additional evidence |
+
+### verdict/verdict.dsse.json
+
+This file contains the cryptographically signed verdict envelope (DSSE format). Verify signatures using:
+
+```bash
+stella audit verify ./bundle/ --check-signatures
+```
+
+## Step 3: Review Evidence
+
+### evidence/sbom.json
+
+Software Bill of Materials (SBOM) listing all components in the artifact.
+
+**Key fields:**
+- `components[]` - List of all software components
+- `dependencies[]` - Dependency relationships
+- `metadata.timestamp` - When SBOM was generated
+
+### evidence/vex-statements/
+
+Vulnerability Exploitability eXchange (VEX) statements that justify vulnerability assessments.
+
+**index.json:**
+```json
+{
+  "statementCount": 3,
+  "statements": [
+    {"fileName": "vex-001.json", "source": "vendor-security"},
+    {"fileName": "vex-002.json", "source": "internal-analysis"}
+  ]
+}
+```
+
+Each VEX statement explains why a vulnerability does or does not affect this artifact.
+
+### evidence/reachability/analysis.json
+
+Reachability analysis showing which vulnerabilities are actually reachable in the code.
+
+```json
+{
+  "components": [
+    {
+      "purl": "pkg:npm/lodash@4.17.21",
+      "vulnerabilities": [
+        {
+          "id": "CVE-2021-23337",
+          "reachable": false,
+          "reason": "Vulnerable function not in call graph"
+        }
+      ]
+    }
+  ]
+}
+```
+
+## Step 4: Review Policy
+
+### policy/policy-snapshot.json
+
+The policy configuration used for evaluation:
+
+```json
+{
+  "policyVersion": "v2.3.1",
+  "gates": ["sbom-required", "vex-trust", "cve-threshold"],
+  "thresholds": {
+    "vexTrustScore": 0.70,
+    "maxCriticalCves": 0,
+    "maxHighCves": 5
+  }
+}
+```
+
+### policy/gate-decision.json
+
+Detailed breakdown of each gate evaluation:
+
+```json
+{
+  "gates": [
+    {
+      "gateId": "vex-trust",
+      "decision": "PASS",
+      "inputs": {
+        "vexStatements": 3,
+        "trustScore": 0.85,
+        "threshold": 0.70
+      }
+    }
+  ]
+}
+```
+
+## Step 5: Replay Verification (Optional)
+
+For maximum assurance, you can replay the verdict evaluation.
+
+### Using Stella CLI
+
+```bash
+cd audit-bundle-sha256-abc123/
+stella replay snapshot --manifest replay/knowledge-snapshot.json
+```
+
+This re-evaluates the policy using the frozen inputs and should produce an identical verdict.
+
+### Manual Replay Steps
+
+See `replay/replay-instructions.md` for detailed steps.
+
+## Compliance Mapping
+
+| Compliance Framework | Relevant Bundle Components |
+|---------------------|---------------------------|
+| **SOC 2 (CC7.1)** | verdict/, policy/ |
+| **ISO 27001 (A.12.6)** | evidence/sbom.json |
+| **FedRAMP** | All components |
+| **SLSA Level 3** | evidence/provenance/ |
+
+## Common Questions
+
+### Q: Why was this artifact blocked?
+
+Review `policy/gate-decision.json` for the specific gate that failed and its reason.
+
+### Q: How do I verify the SBOM is accurate?
+
+The SBOM digest is included in the manifest. Compare against the organization's SBOM generation process.
+
+### Q: What if replay produces a different result?
+
+This may indicate:
+1. Policy version mismatch
+2. Missing evidence files
+3. Time-dependent policy rules
+
+Contact the organization's security team for clarification.
+
+### Q: How long should audit bundles be retained?
+
+Stella Ops recommends:
+- Production releases: 5 years minimum
+- Security-critical systems: 7 years
+- Regulated industries: Per compliance requirements
+
+## Support
+
+For questions about this audit bundle:
+1. Contact the organization's Stella Ops administrator
+2. Reference the Bundle ID from `manifest.json`
+3. Include the artifact digest
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/runbooks/COVERAGE.md
+++ b/docs/operations/runbooks/COVERAGE.md
@@ -0,0 +1,112 @@
+# Runbook Coverage Tracking
+
+This document tracks operational runbook coverage across Stella Ops modules.
+
+**Target:** 80% coverage of critical failure modes before declaring operability moat achieved.
+
+---
+
+## Coverage Summary
+
+| Module | Critical Failures | Runbooks | Coverage | Status |
+|--------|-------------------|----------|----------|--------|
+| Scanner | 5 | 0 | 0% | 🔴 Gap |
+| Policy Engine | 5 | 0 | 0% | 🔴 Gap |
+| Release Orchestrator | 5 | 0 | 0% | 🔴 Gap |
+| Attestor | 5 | 0 | 0% | 🔴 Gap |
+| Feed Connectors | 4 | 0 | 0% | 🔴 Gap |
+| **Database (Postgres)** | 4 | 4 | 100% | ✅ Complete |
+| **Crypto Subsystem** | 4 | 4 | 100% | ✅ Complete |
+| **Evidence Locker** | 4 | 4 | 100% | ✅ Complete |
+| **Backup/Restore** | 4 | 4 | 100% | ✅ Complete |
+| Authority (OAuth/OIDC) | 3 | 0 | 0% | 🔴 Gap |
+| **Overall** | **43** | **16** | **37%** | 🟡 In Progress |
+
+---
+
+## Available Runbooks
+
+### Database Operations
+- [postgres-ops.md](postgres-ops.md) - PostgreSQL database operations
+
+### Crypto Subsystem
+- [crypto-ops.md](crypto-ops.md) - Regional crypto operations (FIPS, eIDAS, GOST, SM)
+
+### Evidence Locker
+- [evidence-locker-ops.md](evidence-locker-ops.md) - Evidence locker operations
+
+### Backup/Restore
+- [backup-restore-ops.md](backup-restore-ops.md) - Backup and restore procedures
+
+### Vulnerability Operations
+- [vuln-ops.md](vuln-ops.md) - Vulnerability management operations
+
+### VEX Operations
+- [vex-ops.md](vex-ops.md) - VEX statement operations
+
+### Policy Incidents
+- [policy-incident.md](policy-incident.md) - Policy-related incident response
+
+---
+
+## Gap Analysis
+
+### High Priority Gaps (Critical modules without runbooks)
+
+1. **Scanner** - Core scanning functionality
+   - Worker stuck
+   - OOM on large images
+   - Registry auth failures
+
+2. **Policy Engine** - Policy evaluation
+   - Slow evaluation
+   - OPA crashes
+   - Compilation failures
+
+3. **Release Orchestrator** - Promotion workflow
+   - Stuck promotions
+   - Gate timeouts
+   - Missing evidence
+
+### Medium Priority Gaps
+
+4. **Attestor** - Signing and verification
+   - Signing failures
+   - Key expiration
+   - Rekor unavailability
+
+5. **Feed Connectors** - Advisory feeds
+   - NVD failures
+   - Rate limiting
+   - Offline bundle issues
+
+### Lower Priority Gaps
+
+6. **Authority** - Authentication
+   - Token validation failures
+   - OIDC provider issues
+
+---
+
+## Template
+
+New runbooks should use the template: [_template.md](_template.md)
+
+---
+
+## Doctor Check Integration
+
+Runbooks should be linked from Doctor check output. Current integration status:
+
+| Module | Doctor Checks | Linked to Runbook |
+|--------|---------------|-------------------|
+| Postgres | 4 | 0 |
+| Crypto | 8 | 0 |
+| Storage | 3 | 0 |
+| Evidence | 4 | 0 |
+
+**Next step:** Update Doctor check implementations to include runbook links in remediation output.
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/runbooks/_template.md
+++ b/docs/operations/runbooks/_template.md
@@ -0,0 +1,157 @@
+# Runbook: [Component] - [Failure Scenario]
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-001 - Runbook Template
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | [Module name: Scanner, Policy, Orchestrator, Attestor, etc.] |
+| **Severity** | Critical / High / Medium / Low |
+| **On-call scope** | [Who should be paged: Platform team, Security team, etc.] |
+| **Last updated** | [YYYY-MM-DD] |
+| **Doctor check** | [Check ID if applicable, e.g., `check.scanner.worker-health`] |
+
+---
+
+## Symptoms
+
+Observable indicators that this failure is occurring:
+
+- [ ] [Symptom 1: e.g., "Scan jobs stuck in pending state for >5 minutes"]
+- [ ] [Symptom 2: e.g., "Error logs contain 'worker timeout exceeded'"]
+- [ ] [Metric/alert that fires: e.g., "Alert `ScannerWorkerStuck` firing"]
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | [e.g., "New scans cannot complete, blocking CI/CD pipelines"] |
+| **Data integrity** | [e.g., "No data loss, but stale scan results may be served"] |
+| **SLA impact** | [e.g., "Scan latency SLO violated if not resolved within 15 minutes"] |
+
+---
+
+## Diagnosis
+
+### Quick checks (< 2 minutes)
+
+Run these first to confirm the failure:
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check [relevant-check-id]
+   ```
+
+2. **Check service status:**
+   ```bash
+   stella [component] status
+   ```
+
+3. **Check recent logs:**
+   ```bash
+   stella [component] logs --tail 50 --level error
+   ```
+
+### Deep diagnosis (if quick checks inconclusive)
+
+1. **[Investigation step 1]:**
+   ```bash
+   [command]
+   ```
+   Expected output: [description]
+   If unexpected: [what it means]
+
+2. **[Investigation step 2]:**
+   ```bash
+   [command]
+   ```
+
+3. **Check related services:**
+   - Postgres connectivity: `stella doctor --check check.storage.postgres`
+   - Valkey connectivity: `stella doctor --check check.storage.valkey`
+   - Network connectivity: `stella doctor --check check.network.[target]`
+
+---
+
+## Resolution
+
+### Immediate mitigation (restore service quickly)
+
+Use these steps to restore service, even if root cause isn't fixed yet:
+
+1. **[Mitigation step 1]:**
+   ```bash
+   [command]
+   ```
+   This will: [explanation]
+
+2. **[Mitigation step 2]:**
+   ```bash
+   [command]
+   ```
+
+### Root cause fix
+
+Once service is restored, address the underlying issue:
+
+1. **[Fix step 1]:**
+   ```bash
+   [command]
+   ```
+
+2. **[Fix step 2]:**
+   ```bash
+   [command]
+   ```
+
+3. **Verify fix is complete:**
+   ```bash
+   stella doctor --check [relevant-check-id]
+   ```
+
+### Verification
+
+Confirm the issue is fully resolved:
+
+```bash
+# Re-run the failing operation
+stella [component] [test-command]
+
+# Verify metrics are healthy
+stella obs metrics --filter [component] --last 5m
+
+# Verify no new errors in logs
+stella [component] logs --tail 20 --level error
+```
+
+---
+
+## Prevention
+
+How to prevent this failure from recurring:
+
+- [ ] **Monitoring:** [e.g., "Add alert for queue depth > 100"]
+- [ ] **Configuration:** [e.g., "Increase worker count in high-volume environments"]
+- [ ] **Code change:** [e.g., "Implement circuit breaker for external service calls"]
+- [ ] **Documentation:** [e.g., "Update capacity planning guide"]
+
+---
+
+## Related Resources
+
+- **Architecture doc:** [Link to relevant architecture documentation]
+- **Related runbooks:** [Links to related failure scenarios]
+- **Doctor check source:** [Link to Doctor check implementation]
+- **Grafana dashboard:** [Link to relevant dashboard]
+
+---
+
+## Revision History
+
+| Date | Author | Changes |
+|------|--------|---------|
+| YYYY-MM-DD | [Name] | Initial version |
--- a/docs/operations/runbooks/attestor-hsm-connection.md
+++ b/docs/operations/runbooks/attestor-hsm-connection.md
@@ -0,0 +1,193 @@
+# Runbook: Attestor - HSM Connection Issues
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-005 - Attestor Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Attestor / Cryptography |
+| **Severity** | Critical |
+| **On-call scope** | Platform team, Security team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.crypto.hsm-availability` |
+
+---
+
+## Symptoms
+
+- [ ] Signing operations failing with "HSM unavailable"
+- [ ] Alert `AttestorHsmConnectionFailed` firing
+- [ ] Error: "PKCS#11 operation failed" or "HSM session timeout"
+- [ ] Attestations cannot be created
+- [ ] Key operations (sign, verify) failing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | No attestations can be signed; releases blocked |
+| **Data integrity** | Keys are safe in HSM; operations resume when connection restored |
+| **SLA impact** | All signing operations blocked; compliance posture at risk |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.crypto.hsm-availability
+   ```
+
+2. **Check HSM connection status:**
+   ```bash
+   stella crypto hsm status
+   ```
+
+3. **Test HSM connectivity:**
+   ```bash
+   stella crypto hsm test
+   ```
+
+### Deep diagnosis
+
+1. **Check PKCS#11 library status:**
+   ```bash
+   stella crypto hsm pkcs11-status
+   ```
+   Look for: Library loaded, slot available, session active
+
+2. **Check HSM network connectivity:**
+   ```bash
+   stella crypto hsm ping
+   ```
+
+3. **Check HSM session logs:**
+   ```bash
+   stella crypto hsm logs --last 30m
+   ```
+   Look for: Session errors, timeout, authentication failures
+
+4. **Check HSM slot status:**
+   ```bash
+   stella crypto hsm slots list
+   ```
+   Problem if: Slot not found, slot busy, token not present
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Attempt HSM reconnection:**
+   ```bash
+   stella crypto hsm reconnect
+   ```
+
+2. **If HSM unreachable, switch to software signing (if permitted):**
+   ```bash
+   stella attest config set signing.mode software
+   stella attest reload
+   ```
+   **Warning:** Software signing may not meet compliance requirements
+
+3. **Use backup HSM if configured:**
+   ```bash
+   stella crypto hsm failover --to backup
+   ```
+
+### Root cause fix
+
+**If network connectivity issue:**
+
+1. Check HSM network path:
+   ```bash
+   stella crypto hsm connectivity --verbose
+   ```
+
+2. Verify firewall rules allow HSM port (typically 1792 for Luna, 2225 for SafeNet)
+
+3. Check HSM server status with vendor tools
+
+**If session timeout:**
+
+1. Increase session timeout:
+   ```bash
+   stella crypto hsm config set session.timeout 300s
+   stella crypto hsm reconnect
+   ```
+
+2. Enable session keep-alive:
+   ```bash
+   stella crypto hsm config set session.keepalive true
+   stella crypto hsm config set session.keepalive_interval 60s
+   ```
+
+**If authentication failed:**
+
+1. Verify HSM credentials:
+   ```bash
+   stella crypto hsm auth verify
+   ```
+
+2. Update HSM PIN if changed:
+   ```bash
+   stella crypto hsm auth update --slot <slot-id>
+   ```
+
+**If PKCS#11 library issue:**
+
+1. Verify library path:
+   ```bash
+   stella crypto hsm config get pkcs11.library_path
+   ```
+
+2. Reload PKCS#11 library:
+   ```bash
+   stella crypto hsm pkcs11-reload
+   ```
+
+3. Check library compatibility:
+   ```bash
+   stella crypto hsm pkcs11-info
+   ```
+
+### Verification
+
+```bash
+# Test HSM connectivity
+stella crypto hsm test
+
+# Test signing operation
+stella attest test-sign
+
+# Verify key access
+stella keys verify <key-id> --operation sign
+
+# Check no errors in logs
+stella crypto hsm logs --level error --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Redundancy:** Configure backup HSM for failover
+- [ ] **Monitoring:** Alert on HSM connection failures immediately
+- [ ] **Keep-alive:** Enable session keep-alive to prevent timeouts
+- [ ] **Testing:** Include HSM health in regular health checks
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/cryptography/hsm-integration.md`
+- **Related runbooks:** `attestor-signing-failed.md`, `crypto-ops.md`
+- **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Crypto/`
+- **HSM setup:** `docs/operations/hsm-configuration.md`
--- a/docs/operations/runbooks/attestor-key-expired.md
+++ b/docs/operations/runbooks/attestor-key-expired.md
@@ -0,0 +1,190 @@
+# Runbook: Attestor - Signing Key Expired
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-005 - Attestor Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Attestor |
+| **Severity** | Critical |
+| **On-call scope** | Platform team, Security team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.attestor.key-expiration` |
+
+---
+
+## Symptoms
+
+- [ ] Attestation creation failing with "key expired" error
+- [ ] Alert `AttestorKeyExpired` firing
+- [ ] Error: "signing key certificate has expired"
+- [ ] New attestations cannot be created
+- [ ] Verification of new attestations failing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | No new attestations can be signed; releases blocked |
+| **Data integrity** | Existing attestations remain valid; new ones cannot be created |
+| **SLA impact** | Release SLO violated; compliance posture compromised |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.attestor.key-expiration
+   ```
+
+2. **List signing keys and expiration:**
+   ```bash
+   stella keys list --type signing --show-expiration
+   ```
+   Look for: Keys with status "expired" or expiring soon
+
+3. **Check active signing key:**
+   ```bash
+   stella attest config get signing.key_id
+   stella keys show <key-id> --details
+   ```
+
+### Deep diagnosis
+
+1. **Check certificate chain validity:**
+   ```bash
+   stella crypto cert verify-chain --key <key-id>
+   ```
+   Problem if: Any certificate in chain expired
+
+2. **Check for backup keys:**
+   ```bash
+   stella keys list --type signing --status inactive
+   ```
+   Look for: Unexpired backup keys that can be activated
+
+3. **Check key rotation history:**
+   ```bash
+   stella keys rotation-history --key <key-id>
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **If backup key available, activate it:**
+   ```bash
+   stella keys activate <backup-key-id>
+   stella attest config set signing.key_id <backup-key-id>
+   stella attest reload
+   ```
+
+2. **Verify signing works:**
+   ```bash
+   stella attest test-sign
+   ```
+
+3. **Retry failed attestations:**
+   ```bash
+   stella attest retry --failed --last 1h
+   ```
+
+### Root cause fix
+
+**Generate new signing key:**
+
+1. Generate new key pair:
+   ```bash
+   stella keys generate \
+     --type signing \
+     --algorithm ecdsa-p256 \
+     --validity 365d \
+     --name "signing-key-$(date +%Y%m%d)"
+   ```
+
+2. If using HSM:
+   ```bash
+   stella keys generate \
+     --type signing \
+     --algorithm ecdsa-p256 \
+     --validity 365d \
+     --hsm-slot <slot> \
+     --name "signing-key-$(date +%Y%m%d)"
+   ```
+
+3. Register the new key:
+   ```bash
+   stella keys register <new-key-id> --purpose attestation-signing
+   ```
+
+4. Update signing configuration:
+   ```bash
+   stella attest config set signing.key_id <new-key-id>
+   stella attest reload
+   ```
+
+5. Publish new public key to trust anchors:
+   ```bash
+   stella issuer keys publish <new-key-id>
+   ```
+
+**Configure automatic rotation:**
+
+1. Enable auto-rotation:
+   ```bash
+   stella keys config set rotation.auto true
+   stella keys config set rotation.before_expiry 30d
+   stella keys config set rotation.overlap_days 14
+   ```
+
+2. Set up rotation alerts:
+   ```bash
+   stella keys config set alerts.expiring_days 30
+   stella keys config set alerts.expiring_days_critical 7
+   ```
+
+### Verification
+
+```bash
+# Verify new key is active
+stella keys list --type signing --status active
+
+# Test signing
+stella attest test-sign
+
+# Create test attestation
+stella attest create --type test --subject "test:key-rotation"
+
+# Verify the attestation
+stella verify attestation --last
+
+# Check key expiration
+stella keys show <new-key-id> --details | grep -i expir
+```
+
+---
+
+## Prevention
+
+- [ ] **Rotation:** Enable automatic key rotation 30 days before expiry
+- [ ] **Monitoring:** Alert on keys expiring within 30 days (warning) and 7 days (critical)
+- [ ] **Backup:** Maintain at least one backup signing key
+- [ ] **Documentation:** Document key rotation procedures and approval process
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/attestor/architecture.md`
+- **Related runbooks:** `attestor-signing-failed.md`, `attestor-hsm-connection.md`
+- **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Attestor/`
+- **Key management:** `docs/operations/key-management.md`
--- a/docs/operations/runbooks/attestor-rekor-unavailable.md
+++ b/docs/operations/runbooks/attestor-rekor-unavailable.md
@@ -0,0 +1,184 @@
+# Runbook: Attestor - Rekor Transparency Log Unreachable
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-005 - Attestor Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Attestor |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.attestor.rekor-connectivity` |
+
+---
+
+## Symptoms
+
+- [ ] Attestation transparency logging failing
+- [ ] Alert `AttestorRekorUnavailable` firing
+- [ ] Error: "Rekor server unavailable" or "transparency log submission failed"
+- [ ] Attestations created but not anchored to transparency log
+- [ ] Verification failing due to missing log entry
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Attestations not publicly verifiable via transparency log |
+| **Data integrity** | Attestations still valid locally; transparency reduced |
+| **SLA impact** | Compliance may require transparency log anchoring |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.attestor.rekor-connectivity
+   ```
+
+2. **Check Rekor connectivity:**
+   ```bash
+   stella attest rekor status
+   ```
+
+3. **Test Rekor endpoint:**
+   ```bash
+   stella attest rekor ping
+   ```
+
+### Deep diagnosis
+
+1. **Check Rekor server URL:**
+   ```bash
+   stella attest config get rekor.url
+   ```
+   Default: https://rekor.sigstore.dev
+
+2. **Check for public Rekor outage:**
+   ```bash
+   stella attest rekor api-status
+   ```
+   Also check: https://status.sigstore.dev/
+
+3. **Check network/proxy issues:**
+   ```bash
+   stella attest rekor test --verbose
+   ```
+   Look for: TLS errors, proxy blocks, timeout
+
+4. **Check pending log entries:**
+   ```bash
+   stella attest rekor pending-entries
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Queue attestations for later submission:**
+   ```bash
+   stella attest config set rekor.queue_on_failure true
+   stella attest reload
+   ```
+
+2. **Disable Rekor requirement temporarily:**
+   ```bash
+   stella attest config set rekor.required false
+   stella attest reload
+   ```
+   **Warning:** Reduces transparency guarantees
+
+3. **Use private Rekor instance if available:**
+   ```bash
+   stella attest config set rekor.url https://rekor.internal.example.com
+   stella attest reload
+   ```
+
+### Root cause fix
+
+**If public Rekor outage:**
+
+1. Wait for Sigstore to resolve the issue
+2. Check status at https://status.sigstore.dev/
+3. Process queued entries when service recovers:
+   ```bash
+   stella attest rekor process-queue
+   ```
+
+**If network/firewall issue:**
+
+1. Verify outbound HTTPS to rekor.sigstore.dev:
+   ```bash
+   stella attest rekor connectivity --verbose
+   ```
+
+2. Configure proxy if required:
+   ```bash
+   stella attest config set rekor.proxy https://proxy:8080
+   ```
+
+3. Add Rekor endpoints to firewall allowlist:
+   - rekor.sigstore.dev:443
+   - fulcio.sigstore.dev:443 (for certificate issuance)
+
+**If TLS certificate issue:**
+
+1. Check certificate validity:
+   ```bash
+   stella attest rekor cert-check
+   ```
+
+2. Update CA certificates:
+   ```bash
+   stella crypto ca update
+   ```
+
+**If private Rekor instance issue:**
+
+1. Check private Rekor server status
+2. Verify Rekor database health
+3. Check Rekor signer availability
+
+### Verification
+
+```bash
+# Test Rekor connectivity
+stella attest rekor ping
+
+# Submit test entry
+stella attest rekor test-submit
+
+# Process any queued entries
+stella attest rekor process-queue
+
+# Verify recent attestation in log
+stella attest rekor lookup --attestation <attestation-id>
+```
+
+---
+
+## Prevention
+
+- [ ] **Redundancy:** Configure private Rekor instance as fallback
+- [ ] **Queuing:** Enable queue-on-failure for resilience
+- [ ] **Monitoring:** Alert on Rekor submission failures
+- [ ] **Offline:** Document attestation validity without Rekor for air-gap scenarios
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/attestor/transparency-log.md`
+- **Related runbooks:** `attestor-signing-failed.md`, `attestor-verification-failed.md`
+- **Sigstore docs:** https://docs.sigstore.dev/
+- **Rekor setup:** `docs/operations/rekor-configuration.md`
--- a/docs/operations/runbooks/attestor-signing-failed.md
+++ b/docs/operations/runbooks/attestor-signing-failed.md
@@ -0,0 +1,176 @@
+# Runbook: Attestor - Signature Generation Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-005 - Attestor Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Attestor |
+| **Severity** | Critical |
+| **On-call scope** | Platform team, Security team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.attestor.signing-health` |
+
+---
+
+## Symptoms
+
+- [ ] Attestation requests failing with "signing failed" error
+- [ ] Alert `AttestorSigningFailed` firing
+- [ ] Evidence bundles missing signatures
+- [ ] Metric `attestor_signing_failures_total` increasing
+- [ ] Release pipeline blocked due to unsigned attestations
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Releases blocked; attestations cannot be created |
+| **Data integrity** | Evidence is recorded but unsigned; can be signed later |
+| **SLA impact** | Release SLO violated; evidence integrity compromised |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.attestor.signing-health
+   ```
+
+2. **Check attestor service status:**
+   ```bash
+   stella attest status
+   ```
+
+3. **Check signing key availability:**
+   ```bash
+   stella keys list --type signing --status active
+   ```
+   Problem if: No active signing keys
+
+### Deep diagnosis
+
+1. **Test signing operation:**
+   ```bash
+   stella attest test-sign --verbose
+   ```
+   Look for: Specific error message
+
+2. **Check key material access:**
+   ```bash
+   stella keys verify <key-id> --operation sign
+   ```
+
+3. **If using HSM, check HSM connectivity:**
+   ```bash
+   stella doctor --check check.crypto.hsm-availability
+   ```
+
+4. **Check for key expiration:**
+   ```bash
+   stella keys list --expiring-within 7d
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **If key expired, rotate to backup key:**
+   ```bash
+   stella keys activate <backup-key-id>
+   stella attest config set signing.key_id <backup-key-id>
+   ```
+
+2. **If HSM unavailable, switch to software signing (temporary):**
+   ```bash
+   stella attest config set signing.mode software
+   stella attest reload
+   ```
+   ⚠️ **Warning:** Software signing may not meet compliance requirements
+
+3. **Retry failed attestations:**
+   ```bash
+   stella attest retry --failed --last 1h
+   ```
+
+### Root cause fix
+
+**If key expired:**
+
+1. Generate new signing key:
+   ```bash
+   stella keys generate --type signing --algorithm ecdsa-p256
+   ```
+
+2. Configure key rotation schedule:
+   ```bash
+   stella keys config set rotation.auto true
+   stella keys config set rotation.overlap_days 14
+   ```
+
+**If HSM connection failed:**
+
+1. Verify HSM configuration:
+   ```bash
+   stella crypto hsm verify
+   ```
+
+2. Restart HSM connection:
+   ```bash
+   stella crypto hsm reconnect
+   ```
+
+**If certificate chain issue:**
+
+1. Verify certificate chain:
+   ```bash
+   stella crypto cert verify-chain --key <key-id>
+   ```
+
+2. Update intermediate certificates:
+   ```bash
+   stella crypto cert update-chain --key <key-id>
+   ```
+
+### Verification
+
+```bash
+# Test signing
+stella attest test-sign
+
+# Create test attestation
+stella attest create --type test --subject "test:verification"
+
+# Verify the attestation
+stella verify attestation --last
+
+# Check no failures in recent operations
+stella attest logs --level error --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Key rotation:** Enable automatic key rotation with 14-day overlap
+- [ ] **Monitoring:** Alert on keys expiring within 30 days
+- [ ] **Backup:** Maintain backup signing key in different HSM slot
+- [ ] **Testing:** Include signing test in health check schedule
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/attestor/architecture.md`
+- **Related runbooks:** `attestor-key-expired.md`, `attestor-hsm-connection.md`
+- **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Attestor/`
+- **Dashboard:** Grafana > Stella Ops > Attestor
--- a/docs/operations/runbooks/attestor-verification-failed.md
+++ b/docs/operations/runbooks/attestor-verification-failed.md
@@ -0,0 +1,195 @@
+# Runbook: Attestor - Attestation Verification Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-005 - Attestor Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Attestor |
+| **Severity** | High |
+| **On-call scope** | Platform team, Security team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.attestor.verification-health` |
+
+---
+
+## Symptoms
+
+- [ ] Attestation verification failing
+- [ ] Alert `AttestorVerificationFailed` firing
+- [ ] Error: "signature verification failed" or "invalid attestation"
+- [ ] Promotions blocked due to failed verification
+- [ ] Error: "trust anchor not found" or "certificate chain invalid"
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Artifacts cannot be promoted; release blocked |
+| **Data integrity** | May indicate tampered attestation or configuration issue |
+| **SLA impact** | Release pipeline blocked until resolved |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.attestor.verification-health
+   ```
+
+2. **Verify specific attestation:**
+   ```bash
+   stella verify attestation --attestation <attestation-id> --verbose
+   ```
+
+3. **Check trust anchors:**
+   ```bash
+   stella trust-anchors list
+   ```
+
+### Deep diagnosis
+
+1. **Check attestation details:**
+   ```bash
+   stella attest show <attestation-id> --details
+   ```
+   Look for: Signer identity, timestamp, subject
+
+2. **Verify certificate chain:**
+   ```bash
+   stella verify cert-chain --attestation <attestation-id>
+   ```
+   Problem if: Intermediate cert missing, root not trusted
+
+3. **Check public key availability:**
+   ```bash
+   stella keys show <key-id> --public
+   ```
+
+4. **Check if issuer is trusted:**
+   ```bash
+   stella issuer trust-status <issuer-id>
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **If trust anchor missing, add it:**
+   ```bash
+   stella trust-anchors add --cert <issuer-cert.pem>
+   ```
+
+2. **If intermediate cert missing:**
+   ```bash
+   stella trust-anchors add-intermediate --cert <intermediate.pem>
+   ```
+
+3. **Re-verify with verbose output:**
+   ```bash
+   stella verify attestation --attestation <attestation-id> --verbose
+   ```
+
+### Root cause fix
+
+**If signature mismatch:**
+
+1. Check attestation wasn't modified:
+   ```bash
+   stella attest integrity-check <attestation-id>
+   ```
+
+2. If modified, regenerate attestation:
+   ```bash
+   stella attest create --subject <digest> --type <type> --force
+   ```
+
+**If key rotated and old key not trusted:**
+
+1. Add old public key to trust anchors:
+   ```bash
+   stella trust-anchors add-key --key <old-key.pem> --expires <date>
+   ```
+
+2. Or fetch from issuer directory:
+   ```bash
+   stella issuer keys fetch <issuer-id>
+   ```
+
+**If certificate expired:**
+
+1. Check certificate validity:
+   ```bash
+   stella verify cert --attestation <attestation-id> --show-expiry
+   ```
+
+2. Re-sign with valid certificate:
+   ```bash
+   stella attest resign <attestation-id>
+   ```
+
+**If issuer not trusted:**
+
+1. Verify issuer identity:
+   ```bash
+   stella issuer show <issuer-id>
+   ```
+
+2. Add to trusted issuers (requires approval):
+   ```bash
+   stella issuer trust <issuer-id> --reason "Approved by security team"
+   ```
+
+**If algorithm not supported:**
+
+1. Check algorithm:
+   ```bash
+   stella attest show <attestation-id> | grep algorithm
+   ```
+
+2. Verify crypto provider supports algorithm:
+   ```bash
+   stella crypto providers list --algorithms
+   ```
+
+### Verification
+
+```bash
+# Verify attestation
+stella verify attestation --attestation <attestation-id>
+
+# Verify trust chain
+stella verify cert-chain --attestation <attestation-id>
+
+# Test end-to-end verification
+stella verify artifact --digest <digest>
+
+# Check no verification errors
+stella attest logs --filter "verification" --level error --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Trust anchors:** Keep trust anchor list current with all valid issuer certs
+- [ ] **Key rotation:** Plan key rotation with overlap period for verification continuity
+- [ ] **Monitoring:** Alert on verification failure rate > 0
+- [ ] **Testing:** Include verification tests in release pipeline
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/attestor/verification.md`
+- **Related runbooks:** `attestor-signing-failed.md`, `attestor-key-expired.md`
+- **Trust management:** `docs/operations/trust-anchors.md`
--- a/docs/operations/runbooks/backup-restore-ops.md
+++ b/docs/operations/runbooks/backup-restore-ops.md
@@ -0,0 +1,449 @@
+# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
+# Task: RUN-004 - Backup/Restore Runbook
+# Backup and Restore Operations Runbook
+
+Status: PRODUCTION-READY (2026-01-17 UTC)
+
+## Scope
+Comprehensive backup and restore procedures for all Stella Ops components including database, evidence locker, configuration, and secrets.
+
+---
+
+## Backup Architecture Overview
+
+### Backup Components
+
+| Component | Backup Type | Default Schedule | Retention |
+|-----------|-------------|------------------|-----------|
+| PostgreSQL | Full + WAL | Daily full, continuous WAL | 30 days |
+| Evidence Locker | Incremental | Daily | 90 days |
+| Configuration | Snapshot | Daily + on change | 90 days |
+| Secrets | Encrypted snapshot | Daily | 30 days |
+| Attestation Keys | Encrypted export | Weekly | 1 year |
+
+### Storage Locations
+
+- **Primary:** `/var/lib/stellaops/backups/` (local)
+- **Secondary:** S3/Azure Blob/GCS (configurable)
+- **Offline:** Removable media for air-gap scenarios
+
+---
+
+## Pre-flight Checklist
+
+### Environment Verification
+```bash
+# Check backup service status
+stella backup status
+
+# Verify backup storage
+stella doctor --check check.storage.backup
+
+# List recent backups
+stella backup list --last 7d
+
+# Test backup restore capability
+stella backup test-restore --latest --dry-run
+```
+
+### Metrics to Watch
+- `stella_backup_last_success_timestamp` - Last successful backup
+- `stella_backup_duration_seconds` - Backup duration
+- `stella_backup_size_bytes` - Backup size
+- `stella_restore_test_last_success` - Last restore test
+
+---
+
+## Standard Procedures
+
+### SP-001: Create Manual Backup
+
+**When:** Before upgrades, schema changes, or major configuration changes
+**Duration:** 5-30 minutes depending on data volume
+
+1. Create full system backup:
+   ```bash
+   stella backup create --full --name "pre-upgrade-$(date +%Y%m%d)"
+   ```
+
+2. Or create component-specific backup:
+   ```bash
+   # Database only
+   stella backup create --type database --name "db-pre-migration"
+   
+   # Evidence locker only
+   stella backup create --type evidence --name "evidence-snapshot"
+   
+   # Configuration only
+   stella backup create --type config --name "config-backup"
+   ```
+
+3. Verify backup:
+   ```bash
+   stella backup verify --name "pre-upgrade-$(date +%Y%m%d)"
+   ```
+
+4. Copy to offsite storage (recommended):
+   ```bash
+   stella backup copy --name "pre-upgrade-$(date +%Y%m%d)" --destination s3://backup-bucket/
+   ```
+
+### SP-002: Verify Backup Integrity
+
+**Frequency:** Weekly
+**Duration:** 15-60 minutes
+
+1. List backups for verification:
+   ```bash
+   stella backup list --unverified
+   ```
+
+2. Verify backup integrity:
+   ```bash
+   # Verify specific backup
+   stella backup verify --name <backup-name>
+   
+   # Verify all unverified
+   stella backup verify --all-unverified
+   ```
+
+3. Test restore (non-destructive):
+   ```bash
+   stella backup test-restore --name <backup-name> --target /tmp/restore-test
+   ```
+
+4. Record verification result:
+   ```bash
+   stella backup log-verification --name <backup-name> --result success
+   ```
+
+### SP-003: Restore from Backup
+
+**CAUTION: This is a destructive operation**
+
+#### Full System Restore
+
+1. Stop all services:
+   ```bash
+   stella service stop --all
+   ```
+
+2. List available backups:
+   ```bash
+   stella backup list --type full
+   ```
+
+3. Restore:
+   ```bash
+   # Dry run first
+   stella backup restore --name <backup-name> --dry-run
+   
+   # Execute restore
+   stella backup restore --name <backup-name> --confirm
+   ```
+
+4. Start services:
+   ```bash
+   stella service start --all
+   ```
+
+5. Verify restoration:
+   ```bash
+   stella doctor --all
+   stella service health
+   ```
+
+#### Component-Specific Restore
+
+1. Database restore:
+   ```bash
+   stella service stop --service api,release-orchestrator
+   stella backup restore --type database --name <backup-name> --confirm
+   stella db migrate  # Apply any pending migrations
+   stella service start --service api,release-orchestrator
+   ```
+
+2. Evidence locker restore:
+   ```bash
+   stella backup restore --type evidence --name <backup-name> --confirm
+   stella evidence verify --mode quick
+   ```
+
+3. Configuration restore:
+   ```bash
+   stella backup restore --type config --name <backup-name> --confirm
+   stella service restart --graceful
+   ```
+
+### SP-004: Point-in-Time Recovery (Database)
+
+1. Identify target recovery point:
+   ```bash
+   # List WAL archives
+   stella backup wal-list --after <start-date> --before <end-date>
+   ```
+
+2. Perform PITR:
+   ```bash
+   stella backup restore-pitr --to-time "2026-01-17T10:30:00Z" --confirm
+   ```
+
+3. Verify data state:
+   ```bash
+   stella db verify-integrity
+   ```
+
+---
+
+## Backup Schedules
+
+### Configure Backup Schedule
+
+```bash
+# View current schedule
+stella backup schedule show
+
+# Set database backup schedule
+stella backup schedule set --type database --cron "0 2 * * *"
+
+# Set evidence backup schedule
+stella backup schedule set --type evidence --cron "0 3 * * *"
+
+# Set configuration backup schedule
+stella backup schedule set --type config --cron "0 4 * * *" --on-change
+```
+
+### Retention Policy
+
+```bash
+# View retention policy
+stella backup retention show
+
+# Set retention
+stella backup retention set --type database --days 30
+stella backup retention set --type evidence --days 90
+stella backup retention set --type config --days 90
+
+# Apply retention (cleanup old backups)
+stella backup retention apply
+```
+
+---
+
+## Incident Procedures
+
+### INC-001: Backup Failure
+
+**Symptoms:**
+- Alert: `StellaBackupFailed`
+- Missing recent backup
+
+**Investigation:**
+```bash
+# Check backup logs
+stella backup logs --last 24h
+
+# Check disk space
+stella doctor --check check.storage.diskspace,check.storage.backup
+
+# Test backup operation
+stella backup test --type database
+```
+
+**Resolution:**
+
+1. **Disk space issue:**
+   ```bash
+   stella backup retention apply --force
+   stella backup cleanup --expired
+   ```
+
+2. **Database connectivity:**
+   ```bash
+   stella doctor --check check.postgres.connectivity
+   ```
+
+3. **Permission issue:**
+   - Check backup directory permissions
+   - Verify service account access
+
+4. **Retry backup:**
+   ```bash
+   stella backup create --type <failed-type> --retry
+   ```
+
+### INC-002: Restore Failure
+
+**Symptoms:**
+- Restore command fails
+- Services not starting after restore
+
+**Investigation:**
+```bash
+# Check restore logs
+stella backup restore-logs --last-attempt
+
+# Verify backup integrity
+stella backup verify --name <backup-name>
+
+# Check disk space
+stella doctor --check check.storage.diskspace
+```
+
+**Resolution:**
+
+1. **Corrupted backup:**
+   ```bash
+   # Try previous backup
+   stella backup list --type <type>
+   stella backup restore --name <previous-backup> --confirm
+   ```
+
+2. **Version mismatch:**
+   ```bash
+   # Check backup version
+   stella backup info --name <backup-name>
+   
+   # Restore with migration
+   stella backup restore --name <backup-name> --with-migration
+   ```
+
+3. **Disk space:**
+   - Free space or expand volume
+   - Restore to alternate location
+
+### INC-003: Backup Storage Full
+
+**Symptoms:**
+- Alert: `StellaBackupStorageFull`
+- New backups failing
+
+**Immediate Actions:**
+```bash
+# Check storage
+stella backup storage stats
+
+# Emergency cleanup
+stella backup cleanup --keep-last 3
+
+# Delete specific old backups
+stella backup delete --older-than 14d --confirm
+```
+
+**Resolution:**
+
+1. **Adjust retention:**
+   ```bash
+   stella backup retention set --type database --days 14
+   stella backup retention apply
+   ```
+
+2. **Expand storage:**
+   - Add disk space
+   - Configure offsite storage
+
+3. **Archive to cold storage:**
+   ```bash
+   stella backup archive --older-than 30d --destination s3://archive-bucket/
+   ```
+
+---
+
+## Disaster Recovery Scenarios
+
+### DR-001: Complete System Loss
+
+1. Provision new infrastructure
+2. Install Stella Ops
+3. Restore from offsite backup:
+   ```bash
+   stella backup restore --source s3://backup-bucket/latest-full.tar.gz --confirm
+   ```
+4. Verify all components
+5. Update DNS/load balancer
+
+### DR-002: Database Corruption
+
+1. Stop services
+2. Restore database from latest clean backup:
+   ```bash
+   stella backup restore --type database --name <last-known-good>
+   ```
+3. Apply WAL to near-corruption point (PITR)
+4. Verify data integrity
+5. Resume services
+
+### DR-003: Evidence Locker Loss
+
+1. Restore evidence from backup:
+   ```bash
+   stella backup restore --type evidence --name <backup-name>
+   ```
+2. Rebuild index:
+   ```bash
+   stella evidence index rebuild
+   ```
+3. Verify anchor chain:
+   ```bash
+   stella evidence anchor verify --all
+   ```
+
+---
+
+## Offline/Air-Gap Backup
+
+### Creating Offline Backup
+
+```bash
+# Create encrypted offline bundle
+stella backup create-offline \
+  --output /media/usb/stellaops-backup-$(date +%Y%m%d).enc \
+  --encrypt \
+  --passphrase-file /secure/backup-key
+
+# Verify offline backup
+stella backup verify-offline --input /media/usb/stellaops-backup-*.enc
+```
+
+### Restoring from Offline Backup
+
+```bash
+# Restore from offline backup
+stella backup restore-offline \
+  --input /media/usb/stellaops-backup-*.enc \
+  --passphrase-file /secure/backup-key \
+  --confirm
+```
+
+---
+
+## Monitoring Dashboard
+
+Access: Grafana → Dashboards → Stella Ops → Backup Status
+
+Key panels:
+- Last backup success time
+- Backup size trend
+- Backup duration
+- Restore test status
+- Storage utilization
+
+---
+
+## Evidence Capture
+
+```bash
+stella backup diagnostics --output /tmp/backup-diag-$(date +%Y%m%dT%H%M%S).tar.gz
+```
+
+---
+
+## Escalation Path
+
+1. **L1 (On-call):** Retry failed backups, basic troubleshooting
+2. **L2 (Platform team):** Restore operations, schedule adjustments
+3. **L3 (Architecture):** Disaster recovery execution
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/runbooks/connector-ghsa.md
+++ b/docs/operations/runbooks/connector-ghsa.md
@@ -0,0 +1,196 @@
+# Runbook: Feed Connector - GitHub Security Advisories (GHSA) Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-006 - Feed Connector Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Concelier / GHSA Connector |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.connector.ghsa-health` |
+
+---
+
+## Symptoms
+
+- [ ] GHSA feed sync failing or stale
+- [ ] Alert `ConnectorGhsaSyncFailed` firing
+- [ ] Error: "GitHub API rate limit exceeded" or "GraphQL query failed"
+- [ ] GitHub Advisory Database vulnerabilities missing
+- [ ] Metric `connector_sync_failures_total{source="ghsa"}` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | GitHub ecosystem vulnerabilities may be missed |
+| **Data integrity** | Data becomes stale; no data loss |
+| **SLA impact** | Vulnerability currency SLO violated for GitHub packages |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.connector.ghsa-health
+   ```
+
+2. **Check GHSA sync status:**
+   ```bash
+   stella admin feeds status --source ghsa
+   ```
+
+3. **Test GitHub API connectivity:**
+   ```bash
+   stella connector test ghsa
+   ```
+
+### Deep diagnosis
+
+1. **Check GitHub API rate limit:**
+   ```bash
+   stella connector ghsa rate-limit-status
+   ```
+   Problem if: Remaining = 0, rate limit exceeded
+
+2. **Check GitHub token permissions:**
+   ```bash
+   stella connector credentials show ghsa --check-scopes
+   ```
+   Required scopes: `public_repo`, `read:packages` (for private advisory access)
+
+3. **Check sync logs:**
+   ```bash
+   stella connector logs ghsa --last 1h --level error
+   ```
+   Look for: GraphQL errors, pagination issues, timeout
+
+4. **Check for GitHub API outage:**
+   ```bash
+   stella connector ghsa api-status
+   ```
+   Also check: https://www.githubstatus.com/
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **If rate limited, wait for reset:**
+   ```bash
+   stella connector ghsa rate-limit-status
+   # Note the reset time, then:
+   stella admin feeds refresh --source ghsa
+   ```
+
+2. **Use secondary token if available:**
+   ```bash
+   stella connector credentials rotate ghsa --to secondary
+   stella admin feeds refresh --source ghsa
+   ```
+
+3. **Load from offline bundle:**
+   ```bash
+   stella offline load --source ghsa --package ghsa-bundle-latest.tar.gz
+   ```
+
+### Root cause fix
+
+**If rate limit consistently exceeded:**
+
+1. Increase sync interval:
+   ```bash
+   stella connector config set ghsa.sync_interval 4h
+   ```
+
+2. Enable incremental sync:
+   ```bash
+   stella connector config set ghsa.incremental_sync true
+   ```
+
+3. Use authenticated requests (10x rate limit):
+   ```bash
+   stella connector credentials update ghsa --token <github-pat>
+   ```
+
+**If token expired or invalid:**
+
+1. Generate new GitHub PAT at https://github.com/settings/tokens
+
+2. Update token:
+   ```bash
+   stella connector credentials update ghsa --token <new-token>
+   ```
+
+3. Verify scopes:
+   ```bash
+   stella connector credentials show ghsa --check-scopes
+   ```
+
+**If GraphQL query failing:**
+
+1. Check for API schema changes:
+   ```bash
+   stella connector ghsa schema-check
+   ```
+
+2. Update connector if schema changed:
+   ```bash
+   stella upgrade --component connector-ghsa
+   ```
+
+**If pagination broken:**
+
+1. Reset sync cursor:
+   ```bash
+   stella connector ghsa reset-cursor
+   ```
+
+2. Force full resync:
+   ```bash
+   stella admin feeds refresh --source ghsa --full
+   ```
+
+### Verification
+
+```bash
+# Force sync
+stella admin feeds refresh --source ghsa
+
+# Monitor sync progress
+stella admin feeds status --source ghsa --watch
+
+# Verify recent advisories present
+stella vuln query GHSA-xxxx-xxxx-xxxx  # Use a recent GHSA ID
+
+# Check no errors
+stella connector logs ghsa --level error --last 1h
+```
+
+---
+
+## Prevention
+
+- [ ] **Authentication:** Always use authenticated requests for 5000/hr rate limit
+- [ ] **Monitoring:** Alert on last sync > 12h or sync failures
+- [ ] **Redundancy:** Use NVD/OSV as backup for GitHub ecosystem coverage
+- [ ] **Token rotation:** Rotate tokens before expiration
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/concelier/connectors.md`
+- **Connector config:** `docs/modules/concelier/operations/connectors/ghsa.md`
+- **Related runbooks:** `connector-nvd.md`, `connector-osv.md`
+- **GitHub API docs:** https://docs.github.com/en/graphql
--- a/docs/operations/runbooks/connector-nvd.md
+++ b/docs/operations/runbooks/connector-nvd.md
@@ -0,0 +1,195 @@
+# Runbook: Feed Connector - NVD Connector Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-006 - Feed Connector Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Concelier / NVD Connector |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.connector.nvd-health` |
+
+---
+
+## Symptoms
+
+- [ ] NVD feed sync failing or stale (> 24h since last successful sync)
+- [ ] Alert `ConnectorNvdSyncFailed` firing
+- [ ] Error: "NVD API request failed" or "rate limit exceeded"
+- [ ] Vulnerability data missing or outdated
+- [ ] Metric `connector_sync_failures_total{source="nvd"}` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Vulnerability scans may miss recent CVEs |
+| **Data integrity** | Data becomes stale; no data loss |
+| **SLA impact** | Vulnerability currency SLO violated (target: < 24h) |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.connector.nvd-health
+   ```
+
+2. **Check NVD sync status:**
+   ```bash
+   stella admin feeds status --source nvd
+   ```
+   Look for: Last sync time, error message, sync state
+
+3. **Check NVD API connectivity:**
+   ```bash
+   stella connector test nvd
+   ```
+
+### Deep diagnosis
+
+1. **Check NVD API key status:**
+   ```bash
+   stella connector credentials show nvd
+   ```
+   Problem if: API key expired or rate limit exhausted
+
+2. **Check NVD API rate limit:**
+   ```bash
+   stella connector nvd rate-limit-status
+   ```
+   Problem if: Remaining requests = 0, reset time in future
+
+3. **Check for NVD API outage:**
+   ```bash
+   stella connector nvd api-status
+   ```
+   Also check: https://nvd.nist.gov/general/news
+
+4. **Check sync logs:**
+   ```bash
+   stella connector logs nvd --last 1h --level error
+   ```
+   Look for: HTTP status codes, timeout errors, parsing failures
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **If rate limited, wait for reset:**
+   ```bash
+   stella connector nvd rate-limit-status
+   # Wait for reset time, then:
+   stella admin feeds refresh --source nvd
+   ```
+
+2. **If API key expired, use anonymous mode (slower):**
+   ```bash
+   stella connector config set nvd.api_key_mode anonymous
+   stella admin feeds refresh --source nvd
+   ```
+
+3. **Load from offline bundle if urgent:**
+   ```bash
+   # If you have a recent offline bundle:
+   stella offline load --source nvd --package nvd-bundle-latest.tar.gz
+   ```
+
+### Root cause fix
+
+**If API key expired or invalid:**
+
+1. Generate new NVD API key at https://nvd.nist.gov/developers/request-an-api-key
+
+2. Update API key:
+   ```bash
+   stella connector credentials update nvd --api-key <new-key>
+   ```
+
+3. Verify connectivity:
+   ```bash
+   stella connector test nvd
+   ```
+
+**If rate limit consistently exceeded:**
+
+1. Increase sync interval to reduce API calls:
+   ```bash
+   stella connector config set nvd.sync_interval 6h
+   ```
+
+2. Enable delta sync to reduce data volume:
+   ```bash
+   stella connector config set nvd.delta_sync true
+   ```
+
+3. Request higher rate limit from NVD (if available)
+
+**If network/firewall issue:**
+
+1. Verify outbound connectivity to NVD API:
+   ```bash
+   stella connector test nvd --verbose
+   ```
+
+2. Check proxy configuration if required:
+   ```bash
+   stella connector config set nvd.proxy https://proxy:8080
+   ```
+
+**If data parsing failures:**
+
+1. Check for NVD schema changes:
+   ```bash
+   stella connector nvd schema-check
+   ```
+
+2. Update connector if schema changed:
+   ```bash
+   stella upgrade --component connector-nvd
+   ```
+
+### Verification
+
+```bash
+# Force sync
+stella admin feeds refresh --source nvd --force
+
+# Monitor sync progress
+stella admin feeds status --source nvd --watch
+
+# Verify recent CVEs are present
+stella vuln query CVE-2026-XXXX  # Use a recent CVE ID
+
+# Check no errors in recent logs
+stella connector logs nvd --level error --last 1h
+```
+
+---
+
+## Prevention
+
+- [ ] **API Key:** Always use API key (not anonymous) for 10x rate limit
+- [ ] **Monitoring:** Alert on last sync > 24h or sync failure
+- [ ] **Redundancy:** Configure backup connector (OSV, GitHub Advisory) for overlap
+- [ ] **Offline:** Maintain weekly offline bundle for disaster recovery
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/concelier/connectors.md`
+- **Connector config:** `docs/modules/concelier/operations/connectors/nvd.md`
+- **Related runbooks:** `connector-ghsa.md`, `connector-osv.md`
+- **Dashboard:** Grafana > Stella Ops > Feed Connectors
--- a/docs/operations/runbooks/connector-osv.md
+++ b/docs/operations/runbooks/connector-osv.md
@@ -0,0 +1,193 @@
+# Runbook: Feed Connector - OSV (Open Source Vulnerabilities) Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-006 - Feed Connector Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Concelier / OSV Connector |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.connector.osv-health` |
+
+---
+
+## Symptoms
+
+- [ ] OSV feed sync failing or stale
+- [ ] Alert `ConnectorOsvSyncFailed` firing
+- [ ] Error: "OSV API request failed" or "ecosystem sync failed"
+- [ ] OSV vulnerabilities missing from database
+- [ ] Metric `connector_sync_failures_total{source="osv"}` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Open source ecosystem vulnerabilities may be missed |
+| **Data integrity** | Data becomes stale; no data loss |
+| **SLA impact** | Vulnerability currency SLO violated for affected ecosystems |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.connector.osv-health
+   ```
+
+2. **Check OSV sync status:**
+   ```bash
+   stella admin feeds status --source osv
+   ```
+
+3. **Test OSV API connectivity:**
+   ```bash
+   stella connector test osv
+   ```
+
+### Deep diagnosis
+
+1. **Check ecosystem-specific status:**
+   ```bash
+   stella connector osv ecosystems status
+   ```
+   Look for: Failed ecosystems, stale ecosystems
+
+2. **Check sync logs:**
+   ```bash
+   stella connector logs osv --last 1h --level error
+   ```
+   Look for: API errors, parsing failures, timeout
+
+3. **Check for OSV API outage:**
+   ```bash
+   stella connector osv api-status
+   ```
+   Also check: https://osv.dev/
+
+4. **Check GCS bucket access (OSV uses GCS for bulk data):**
+   ```bash
+   stella connector osv gcs-status
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Retry sync for specific ecosystem:**
+   ```bash
+   stella admin feeds refresh --source osv --ecosystem npm
+   ```
+
+2. **Sync from GCS bucket directly (faster for bulk):**
+   ```bash
+   stella connector osv sync-from-gcs
+   ```
+
+3. **Load from offline bundle:**
+   ```bash
+   stella offline load --source osv --package osv-bundle-latest.tar.gz
+   ```
+
+### Root cause fix
+
+**If API request failing:**
+
+1. Check API endpoint:
+   ```bash
+   stella connector osv api-test
+   ```
+
+2. Verify no proxy blocking:
+   ```bash
+   stella connector config set osv.proxy <proxy-url>
+   ```
+
+**If GCS access failing:**
+
+1. Check GCS connectivity:
+   ```bash
+   stella connector osv gcs-test
+   ```
+
+2. Enable anonymous access (default):
+   ```bash
+   stella connector config set osv.gcs_auth anonymous
+   ```
+
+3. Or configure service account:
+   ```bash
+   stella connector config set osv.gcs_credentials /path/to/sa-key.json
+   ```
+
+**If specific ecosystem failing:**
+
+1. Disable problematic ecosystem temporarily:
+   ```bash
+   stella connector config set osv.ecosystems.disabled <ecosystem>
+   ```
+
+2. Check ecosystem data format:
+   ```bash
+   stella connector osv ecosystem-check <ecosystem>
+   ```
+
+**If parsing errors:**
+
+1. Check for schema changes:
+   ```bash
+   stella connector osv schema-check
+   ```
+
+2. Update connector:
+   ```bash
+   stella upgrade --component connector-osv
+   ```
+
+### Verification
+
+```bash
+# Force sync
+stella admin feeds refresh --source osv
+
+# Monitor sync progress
+stella admin feeds status --source osv --watch
+
+# Verify ecosystem coverage
+stella connector osv ecosystems status
+
+# Query recent vulnerability
+stella vuln query OSV-2026-xxxx
+
+# Check no errors
+stella connector logs osv --level error --last 1h
+```
+
+---
+
+## Prevention
+
+- [ ] **Bulk sync:** Use GCS bulk sync for initial load and daily updates
+- [ ] **Monitoring:** Alert on ecosystem sync failures
+- [ ] **Redundancy:** NVD/GHSA provide overlapping coverage for major ecosystems
+- [ ] **Offline:** Maintain weekly offline bundle
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/concelier/connectors.md`
+- **Connector config:** `docs/modules/concelier/operations/connectors/osv.md`
+- **Related runbooks:** `connector-nvd.md`, `connector-ghsa.md`
+- **OSV API docs:** https://osv.dev/docs/
--- a/docs/operations/runbooks/connector-vendor-specific.md
+++ b/docs/operations/runbooks/connector-vendor-specific.md
@@ -0,0 +1,220 @@
+# Runbook Template: Feed Connector - Vendor-Specific Connectors
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-006 - Feed Connector Runbooks
+
+## Overview
+
+This is a template runbook for vendor-specific advisory feed connectors (RedHat, Ubuntu, Debian, Oracle, VMware, etc.). Use this template to create runbooks for specific vendor connectors.
+
+---
+
+## Metadata Template
+
+| Field | Value |
+|-------|-------|
+| **Component** | Concelier / [Vendor] Connector |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | [Date] |
+| **Doctor check** | `check.connector.[vendor]-health` |
+
+---
+
+## Common Vendor Connector Issues
+
+### Authentication Failures
+
+**Symptoms:**
+- Sync failing with 401/403 errors
+- "authentication failed" or "invalid credentials"
+
+**Resolution:**
+```bash
+# Check credentials
+stella connector credentials show <vendor>
+
+# Update credentials
+stella connector credentials update <vendor> --api-key <key>
+
+# Test connectivity
+stella connector test <vendor>
+```
+
+### Rate Limiting
+
+**Symptoms:**
+- Sync failing with 429 errors
+- "rate limit exceeded"
+
+**Resolution:**
+```bash
+# Check rate limit status
+stella connector <vendor> rate-limit-status
+
+# Increase sync interval
+stella connector config set <vendor>.sync_interval 6h
+
+# Enable delta sync
+stella connector config set <vendor>.delta_sync true
+```
+
+### Data Format Changes
+
+**Symptoms:**
+- Parsing errors in sync logs
+- "unexpected format" or "schema validation failed"
+
+**Resolution:**
+```bash
+# Check for schema changes
+stella connector <vendor> schema-check
+
+# Update connector
+stella upgrade --component connector-<vendor>
+```
+
+### Offline Bundle Refresh
+
+**Resolution:**
+```bash
+# Create offline bundle
+stella offline sync --feeds <vendor> --output <vendor>-bundle.tar.gz
+
+# Load offline bundle
+stella offline load --source <vendor> --package <vendor>-bundle.tar.gz
+```
+
+---
+
+## Vendor-Specific Runbooks
+
+Use this template to create runbooks for:
+
+### RedHat Security Data
+
+**Endpoint:** https://access.redhat.com/security/data/
+**Authentication:** API token or certificate
+**Connector:** `connector-redhat`
+
+Key commands:
+```bash
+stella connector test redhat
+stella admin feeds status --source redhat
+stella connector redhat cve-map-status  # RHSA to CVE mapping
+```
+
+### Ubuntu Security Notices
+
+**Endpoint:** https://ubuntu.com/security/notices
+**Authentication:** None (public)
+**Connector:** `connector-ubuntu`
+
+Key commands:
+```bash
+stella connector test ubuntu
+stella admin feeds status --source ubuntu
+stella connector ubuntu usn-status  # USN sync status
+```
+
+### Debian Security Tracker
+
+**Endpoint:** https://security-tracker.debian.org/
+**Authentication:** None (public)
+**Connector:** `connector-debian`
+
+Key commands:
+```bash
+stella connector test debian
+stella admin feeds status --source debian
+stella connector debian dla-status  # DLA sync status
+```
+
+### Oracle Security Alerts
+
+**Endpoint:** https://www.oracle.com/security-alerts/
+**Authentication:** Oracle account (optional)
+**Connector:** `connector-oracle`
+
+Key commands:
+```bash
+stella connector test oracle
+stella admin feeds status --source oracle
+stella connector oracle cpu-status  # Critical Patch Update status
+```
+
+### VMware Security Advisories
+
+**Endpoint:** https://www.vmware.com/security/advisories
+**Authentication:** None (public)
+**Connector:** `connector-vmware`
+
+Key commands:
+```bash
+stella connector test vmware
+stella admin feeds status --source vmware
+stella connector vmware vmsa-status  # VMSA sync status
+```
+
+---
+
+## Diagnosis Checklist
+
+For any vendor connector issue:
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.connector.<vendor>-health
+   ```
+
+2. **Check sync status:**
+   ```bash
+   stella admin feeds status --source <vendor>
+   ```
+
+3. **Test connectivity:**
+   ```bash
+   stella connector test <vendor>
+   ```
+
+4. **Check logs:**
+   ```bash
+   stella connector logs <vendor> --last 1h --level error
+   ```
+
+5. **Check credentials (if applicable):**
+   ```bash
+   stella connector credentials show <vendor>
+   ```
+
+---
+
+## Resolution Checklist
+
+1. **Retry sync:**
+   ```bash
+   stella admin feeds refresh --source <vendor>
+   ```
+
+2. **Update credentials (if auth issue):**
+   ```bash
+   stella connector credentials update <vendor>
+   ```
+
+3. **Update connector (if format changed):**
+   ```bash
+   stella upgrade --component connector-<vendor>
+   ```
+
+4. **Load offline bundle (if API unavailable):**
+   ```bash
+   stella offline load --source <vendor> --package <vendor>-bundle.tar.gz
+   ```
+
+---
+
+## Related Resources
+
+- **Connector architecture:** `docs/modules/concelier/connectors.md`
+- **Vendor connector configs:** `docs/modules/concelier/operations/connectors/`
+- **Related runbooks:** `connector-nvd.md`, `connector-ghsa.md`, `connector-osv.md`
--- a/docs/operations/runbooks/crypto-ops.md
+++ b/docs/operations/runbooks/crypto-ops.md
@@ -0,0 +1,370 @@
+# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
+# Task: RUN-002 - Crypto Subsystem Runbook
+# Regional Crypto Operations Runbook
+
+Status: PRODUCTION-READY (2026-01-17 UTC)
+
+## Scope
+Cryptographic subsystem operations including HSM management, regional crypto profile configuration, key rotation, and certificate management for all supported crypto profiles (International, FIPS, eIDAS, GOST, SM).
+
+---
+
+## Pre-flight Checklist
+
+### Environment Verification
+```bash
+# Check crypto subsystem health
+stella doctor --category crypto
+
+# Verify active crypto profile
+stella crypto profile show
+
+# List loaded crypto providers
+stella crypto providers list
+
+# Check key status
+stella crypto keys status
+```
+
+### Metrics to Watch
+- `stella_crypto_operations_total` - Crypto operation count by type
+- `stella_crypto_operation_duration_seconds` - Signing/verification latency
+- `stella_hsm_availability` - HSM availability (if configured)
+- `stella_cert_expiry_days` - Certificate expiration countdown
+
+---
+
+## Regional Crypto Profiles
+
+### Profile Overview
+
+| Profile | Use Case | Key Algorithms | Compliance |
+|---------|----------|----------------|------------|
+| `international` | Default, most deployments | RSA-2048+, ECDSA P-256/P-384, Ed25519 | General |
+| `fips` | US Government / FedRAMP | FIPS 140-2 approved algorithms only | FIPS 140-2 |
+| `eidas` | European Union | RSA-PSS, ECDSA, Ed25519 per ETSI TS 119 312 | eIDAS |
+| `gost` | Russian Federation | GOST R 34.10-2012, GOST R 34.11-2012 | Russian standards |
+| `sm` | China | SM2, SM3, SM4 | GM/T 0003-2012 |
+
+### Switching Profiles
+
+1. **Pre-switch verification:**
+   ```bash
+   # Verify target profile is available
+   stella crypto profile verify --profile <target-profile>
+   
+   # Check for incompatible existing signatures
+   stella crypto audit --check-compatibility --target-profile <target-profile>
+   ```
+
+2. **Profile switch:**
+   ```bash
+   # Switch profile (requires service restart)
+   stella crypto profile set --profile <target-profile>
+   
+   # Restart services to apply
+   stella service restart --graceful
+   ```
+
+3. **Post-switch verification:**
+   ```bash
+   stella doctor --check check.crypto.fips,check.crypto.eidas,check.crypto.gost,check.crypto.sm
+   ```
+
+---
+
+## Standard Procedures
+
+### SP-001: Key Rotation
+
+**Frequency:** Quarterly or per policy
+**Duration:** ~15 minutes (no downtime)
+
+1. Generate new key:
+   ```bash
+   # For software keys
+   stella crypto keys generate --type signing --algorithm ecdsa-p256 --name signing-$(date +%Y%m)
+   
+   # For HSM-backed keys
+   stella crypto keys generate --type signing --algorithm ecdsa-p256 --provider hsm --name signing-$(date +%Y%m)
+   ```
+
+2. Activate new key:
+   ```bash
+   stella crypto keys activate --name signing-$(date +%Y%m)
+   ```
+
+3. Verify signing with new key:
+   ```bash
+   echo "test" | stella crypto sign --output /dev/null
+   ```
+
+4. Schedule old key deactivation:
+   ```bash
+   stella crypto keys schedule-deactivation --name <old-key-name> --in 30d
+   ```
+
+### SP-002: Certificate Renewal
+
+**When:** Certificate expiring within 30 days
+
+1. Check expiration:
+   ```bash
+   stella crypto certs check-expiry
+   ```
+
+2. Generate CSR:
+   ```bash
+   stella crypto certs csr --subject "CN=stellaops.example.com,O=Example Corp" --output cert.csr
+   ```
+
+3. Install renewed certificate:
+   ```bash
+   stella crypto certs install --cert renewed-cert.pem --chain ca-chain.pem
+   ```
+
+4. Verify certificate chain:
+   ```bash
+   stella doctor --check check.crypto.certchain
+   ```
+
+5. Restart services:
+   ```bash
+   stella service restart --graceful
+   ```
+
+### SP-003: HSM Health Check
+
+**Frequency:** Daily (automated) or on-demand
+
+1. Check HSM connectivity:
+   ```bash
+   stella crypto hsm status
+   ```
+
+2. Verify slot access:
+   ```bash
+   stella crypto hsm slots list
+   ```
+
+3. Test signing operation:
+   ```bash
+   stella crypto hsm test-sign
+   ```
+
+4. Check HSM metrics:
+   - Free objects/sessions
+   - Temperature/health (vendor-specific)
+
+---
+
+## Incident Procedures
+
+### INC-001: HSM Unavailable
+
+**Symptoms:**
+- Alert: `StellaHsmUnavailable`
+- Signing operations failing with "HSM connection error"
+
+**Investigation:**
+```bash
+# Check HSM status
+stella crypto hsm status
+
+# Test PKCS#11 module
+stella crypto hsm test-module
+
+# Check network to HSM
+stella network test --host <hsm-host> --port <hsm-port>
+```
+
+**Resolution:**
+
+1. **Network issue:**
+   - Verify network path to HSM
+   - Check firewall rules
+   - Verify HSM appliance is powered on
+
+2. **Session exhaustion:**
+   ```bash
+   # Release stale sessions
+   stella crypto hsm sessions release --stale
+   
+   # Restart crypto service
+   stella service restart --service crypto-signer
+   ```
+
+3. **HSM failure:**
+   - Fail over to secondary HSM (if configured)
+   - Contact HSM vendor support
+   - Consider temporary fallback to software keys (with approval)
+
+### INC-002: Signing Key Compromised
+
+**CRITICAL - Follow incident response procedure**
+
+1. **Immediate containment:**
+   ```bash
+   # Revoke compromised key
+   stella crypto keys revoke --name <compromised-key> --reason compromise
+   
+   # Block signing with compromised key
+   stella crypto keys block --name <compromised-key>
+   ```
+
+2. **Generate replacement key:**
+   ```bash
+   stella crypto keys generate --type signing --algorithm ecdsa-p256 --name emergency-signing
+   stella crypto keys activate --name emergency-signing
+   ```
+
+3. **Notify downstream:**
+   - Update trust registries with new key
+   - Notify relying parties
+   - Publish key revocation notice
+
+4. **Forensics:**
+   ```bash
+   # Export key usage audit log
+   stella crypto audit export --key <compromised-key> --output /secure/key-audit.json
+   ```
+
+### INC-003: Certificate Expired
+
+**Symptoms:**
+- TLS connection failures
+- Alert: `StellaCertExpired`
+
+**Immediate Resolution:**
+
+1. If renewed certificate is available:
+   ```bash
+   stella crypto certs install --cert renewed-cert.pem --chain ca-chain.pem
+   stella service restart --graceful
+   ```
+
+2. If renewal not ready - emergency self-signed (temporary):
+   ```bash
+   # Generate emergency certificate (NOT for production use)
+   stella crypto certs generate-self-signed --days 7 --name emergency
+   stella crypto certs install --cert emergency.pem
+   stella service restart --graceful
+   ```
+
+3. Expedite certificate renewal process
+
+### INC-004: FIPS Mode Not Enabled
+
+**Symptoms:**
+- Alert: `StellaFipsNotEnabled`
+- Compliance audit failure
+
+**Resolution:**
+
+1. **Linux:**
+   ```bash
+   # Enable FIPS mode
+   sudo fips-mode-setup --enable
+   
+   # Reboot required
+   sudo reboot
+   
+   # Verify after reboot
+   fips-mode-setup --check
+   ```
+
+2. **Windows:**
+   - Enable via Group Policy
+   - Or via registry:
+     ```powershell
+     Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\Lsa\FipsAlgorithmPolicy" -Name "Enabled" -Value 1
+     Restart-Computer
+     ```
+
+3. Restart Stella services:
+   ```bash
+   stella service restart
+   stella doctor --check check.crypto.fips
+   ```
+
+---
+
+## Regional-Specific Procedures
+
+### GOST Configuration (Russian Federation)
+
+1. Install GOST engine:
+   ```bash
+   sudo apt install libengine-gost-openssl1.1
+   ```
+
+2. Configure Stella:
+   ```bash
+   stella crypto profile set --profile gost
+   stella crypto config set --gost-engine-path /usr/lib/x86_64-linux-gnu/engines-3/gost.so
+   ```
+
+3. Verify:
+   ```bash
+   stella doctor --check check.crypto.gost
+   ```
+
+### SM Configuration (China)
+
+1. Ensure OpenSSL 1.1.1+ with SM support:
+   ```bash
+   openssl version
+   openssl list -cipher-algorithms | grep -i sm
+   ```
+
+2. Configure Stella:
+   ```bash
+   stella crypto profile set --profile sm
+   ```
+
+3. Verify:
+   ```bash
+   stella doctor --check check.crypto.sm
+   ```
+
+---
+
+## Monitoring Dashboard
+
+Access: Grafana → Dashboards → Stella Ops → Crypto Subsystem
+
+Key panels:
+- Signing operation latency
+- Key usage by key ID
+- HSM availability
+- Certificate expiration countdown
+- Crypto profile in use
+
+---
+
+## Evidence Capture
+
+```bash
+# Comprehensive crypto diagnostics
+stella crypto diagnostics --output /tmp/crypto-diag-$(date +%Y%m%dT%H%M%S).tar.gz
+```
+
+Bundle includes:
+- Active crypto profile
+- Key inventory (public keys only)
+- Certificate chain
+- HSM status
+- Operation audit log (last 24h)
+
+---
+
+## Escalation Path
+
+1. **L1 (On-call):** Certificate installs, key activation
+2. **L2 (Security team):** Key rotation, HSM issues
+3. **L3 (Crypto SME):** Algorithm issues, compliance questions
+4. **HSM Vendor:** Hardware failures
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/runbooks/evidence-locker-ops.md
+++ b/docs/operations/runbooks/evidence-locker-ops.md
@@ -0,0 +1,408 @@
+# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
+# Task: RUN-003 - Evidence Locker Runbook
+# Evidence Locker Operations Runbook
+
+Status: PRODUCTION-READY (2026-01-17 UTC)
+
+## Scope
+Evidence locker operations including storage management, integrity verification, attestation management, provenance chain maintenance, and disaster recovery procedures.
+
+---
+
+## Pre-flight Checklist
+
+### Environment Verification
+```bash
+# Check evidence locker health
+stella doctor --category evidence
+
+# Verify storage accessibility
+stella evidence status
+
+# Check index health
+stella evidence index status
+
+# Verify anchor chain
+stella evidence anchor verify --latest
+```
+
+### Metrics to Watch
+- `stella_evidence_artifacts_total` - Total artifacts stored
+- `stella_evidence_retrieval_latency_seconds` - Retrieval latency P99
+- `stella_evidence_storage_bytes` - Storage consumption
+- `stella_merkle_anchor_age_seconds` - Time since last anchor
+
+---
+
+## Standard Procedures
+
+### SP-001: Daily Integrity Check
+
+**Frequency:** Daily (automated) or on-demand
+**Duration:** Varies by locker size (typically 5-30 minutes)
+
+1. Run integrity verification:
+   ```bash
+   # Quick check (sample-based)
+   stella evidence verify --mode quick
+   
+   # Full check (all artifacts)
+   stella evidence verify --mode full
+   ```
+
+2. Review results:
+   ```bash
+   stella evidence verify-report --latest
+   ```
+
+3. Address any failures:
+   ```bash
+   # List failed artifacts
+   stella evidence verify-report --latest --filter failed
+   ```
+
+### SP-002: Index Maintenance
+
+**Frequency:** Weekly or after large ingestion
+**Duration:** ~10 minutes
+
+1. Check index health:
+   ```bash
+   stella evidence index status
+   ```
+
+2. Refresh index if needed:
+   ```bash
+   # Incremental refresh
+   stella evidence index refresh
+   
+   # Full rebuild (if corruption suspected)
+   stella evidence index rebuild
+   ```
+
+3. Optimize index:
+   ```bash
+   stella evidence index optimize
+   ```
+
+### SP-003: Merkle Anchoring
+
+**Frequency:** Per policy (default: every 6 hours)
+**Duration:** ~2 minutes
+
+1. Create new anchor:
+   ```bash
+   stella evidence anchor create
+   ```
+
+2. Verify anchor chain:
+   ```bash
+   stella evidence anchor verify --all
+   ```
+
+3. Export anchor for external archival:
+   ```bash
+   stella evidence anchor export --latest --output anchor-$(date +%Y%m%dT%H%M%S).json
+   ```
+
+### SP-004: Storage Cleanup
+
+**Frequency:** Monthly or when storage alerts trigger
+**Duration:** Varies
+
+1. Review storage usage:
+   ```bash
+   stella evidence storage stats
+   ```
+
+2. Apply retention policy:
+   ```bash
+   # Dry run first
+   stella evidence cleanup --apply-retention --dry-run
+   
+   # Execute cleanup
+   stella evidence cleanup --apply-retention
+   ```
+
+3. Archive old evidence (if required):
+   ```bash
+   stella evidence archive --older-than 365d --output /archive/evidence-$(date +%Y).tar
+   ```
+
+---
+
+## Incident Procedures
+
+### INC-001: Integrity Verification Failure
+
+**Symptoms:**
+- Alert: `StellaEvidenceIntegrityFailure`
+- Verification reports hash mismatch
+
+**Investigation:**
+```bash
+# Get failure details
+stella evidence verify-report --latest --filter failed --format json > /tmp/integrity-failures.json
+
+# Check specific artifact
+stella evidence inspect <artifact-id>
+
+# Check provenance
+stella evidence provenance show <artifact-id>
+```
+
+**Resolution:**
+
+1. **Isolated corruption:**
+   ```bash
+   # Attempt recovery from replica (if available)
+   stella evidence recover --id <artifact-id> --source replica
+   
+   # If no replica, mark as corrupted
+   stella evidence mark-corrupted --id <artifact-id> --reason "hash-mismatch"
+   ```
+
+2. **Widespread corruption:**
+   - Stop evidence ingestion
+   - Identify corruption extent
+   - Restore from backup if necessary
+   - Escalate to L3
+
+3. **False positive (software bug):**
+   - Verify with multiple hash implementations
+   - Check for recent software updates
+   - Report bug if confirmed
+
+### INC-002: Evidence Retrieval Failure
+
+**Symptoms:**
+- Alert: `StellaEvidenceRetrievalFailed`
+- API returning 404 for known artifacts
+
+**Investigation:**
+```bash
+# Check if artifact exists
+stella evidence exists <artifact-id>
+
+# Check index
+stella evidence index lookup <artifact-id>
+
+# Check storage backend
+stella evidence storage check <artifact-id>
+```
+
+**Resolution:**
+
+1. **Index corruption:**
+   ```bash
+   # Rebuild index
+   stella evidence index rebuild
+   ```
+
+2. **Storage backend issue:**
+   ```bash
+   # Check storage health
+   stella doctor --check check.storage.evidencelocker
+   
+   # Verify storage connectivity
+   stella evidence storage test
+   ```
+
+3. **File system issue:**
+   - Check disk health
+   - Verify file permissions
+   - Check mount status
+
+### INC-003: Anchor Chain Break
+
+**Symptoms:**
+- Alert: `StellaMerkleAnchorChainBroken`
+- Anchor verification fails
+
+**Investigation:**
+```bash
+# Check anchor chain
+stella evidence anchor verify --all --verbose
+
+# Find break point
+stella evidence anchor list --show-links
+
+# Inspect specific anchor
+stella evidence anchor inspect <anchor-id>
+```
+
+**Resolution:**
+
+1. **Single broken link:**
+   ```bash
+   # Attempt to recover from backup
+   stella evidence anchor recover --id <anchor-id> --source backup
+   ```
+
+2. **Multiple breaks:**
+   - Stop new anchoring
+   - Assess extent of damage
+   - Restore from backup or rebuild chain
+
+3. **Create new chain segment:**
+   ```bash
+   # Start new chain (preserves old chain as archived)
+   stella evidence anchor new-chain --reason "chain-break-recovery"
+   ```
+
+### INC-004: Storage Full
+
+**Symptoms:**
+- Alert: `StellaEvidenceStorageFull`
+- Ingestion failing
+
+**Immediate Actions:**
+```bash
+# Check storage usage
+stella evidence storage stats
+
+# Emergency cleanup of temporary files
+stella evidence cleanup --temp-only
+
+# Find large/old artifacts
+stella evidence storage analyze --sort size --limit 20
+```
+
+**Resolution:**
+
+1. **Apply retention policy:**
+   ```bash
+   stella evidence cleanup --apply-retention --aggressive
+   ```
+
+2. **Archive old evidence:**
+   ```bash
+   stella evidence archive --older-than 180d --compress
+   ```
+
+3. **Expand storage:**
+   - Follow cloud provider procedure
+   - Or add additional storage volume
+
+---
+
+## Disaster Recovery
+
+### DR-001: Full Evidence Locker Recovery
+
+**Prerequisites:**
+- Backup available
+- Target storage provisioned
+- Recovery environment ready
+
+**Procedure:**
+
+1. Provision new storage:
+   ```bash
+   stella evidence storage provision --size <size>
+   ```
+
+2. Restore from backup:
+   ```bash
+   # List available backups
+   stella backup list --type evidence-locker
+   
+   # Restore
+   stella evidence restore --backup-id <backup-id> --target /var/lib/stellaops/evidence
+   ```
+
+3. Verify restoration:
+   ```bash
+   stella evidence verify --mode full
+   stella evidence anchor verify --all
+   ```
+
+4. Update service configuration:
+   ```bash
+   stella config set EvidenceLocker:Path /var/lib/stellaops/evidence
+   stella service restart
+   ```
+
+### DR-002: Point-in-Time Recovery
+
+For recovering to a specific point in time:
+
+1. Identify target anchor:
+   ```bash
+   stella evidence anchor list --before <timestamp>
+   ```
+
+2. Restore to that point:
+   ```bash
+   stella evidence restore --to-anchor <anchor-id>
+   ```
+
+3. Verify integrity:
+   ```bash
+   stella evidence verify --mode full --to-anchor <anchor-id>
+   ```
+
+---
+
+## Offline Mode Operations
+
+### Preparing Offline Evidence Pack
+
+```bash
+# Export evidence for specific artifact
+stella evidence export --digest <artifact-digest> --output evidence-pack.tar.gz
+
+# Export with all dependencies
+stella evidence export --digest <artifact-digest> --include-deps --output evidence-full.tar.gz
+```
+
+### Verifying Evidence Offline
+
+```bash
+# Verify evidence pack without network
+stella evidence verify --offline --input evidence-pack.tar.gz
+
+# Replay verdict using evidence
+stella replay --evidence evidence-pack.tar.gz --output verdict.json
+```
+
+---
+
+## Monitoring Dashboard
+
+Access: Grafana → Dashboards → Stella Ops → Evidence Locker
+
+Key panels:
+- Artifact ingestion rate
+- Retrieval latency
+- Storage utilization trend
+- Integrity check status
+- Anchor chain health
+
+---
+
+## Evidence Capture
+
+For any incident:
+```bash
+stella evidence diagnostics --output /tmp/evidence-diag-$(date +%Y%m%dT%H%M%S).tar.gz
+```
+
+Bundle includes:
+- Index status
+- Storage stats
+- Recent anchor chain
+- Integrity check results
+- Operation audit log
+
+---
+
+## Escalation Path
+
+1. **L1 (On-call):** Standard procedures, cleanup operations
+2. **L2 (Platform team):** Index rebuild, anchor issues
+3. **L3 (Architecture):** Chain recovery, DR procedures
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/runbooks/orchestrator-evidence-missing.md
+++ b/docs/operations/runbooks/orchestrator-evidence-missing.md
@@ -0,0 +1,183 @@
+# Runbook: Release Orchestrator - Required Evidence Not Found
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-004 - Release Orchestrator Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Release Orchestrator |
+| **Severity** | High |
+| **On-call scope** | Platform team, Security team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.orchestrator.evidence-availability` |
+
+---
+
+## Symptoms
+
+- [ ] Promotion failing with "required evidence not found"
+- [ ] Alert `OrchestratorEvidenceMissing` firing
+- [ ] Gate evaluation blocked waiting for evidence
+- [ ] Error: "SBOM not found" or "attestation missing"
+- [ ] Evidence chain incomplete for artifact
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Promotion blocked until evidence is generated |
+| **Data integrity** | Indicates missing security artifact - must be resolved |
+| **SLA impact** | Release blocked; compliance requirements not met |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.orchestrator.evidence-availability
+   ```
+
+2. **List missing evidence for promotion:**
+   ```bash
+   stella promotion evidence <promotion-id> --missing
+   ```
+
+3. **Check what evidence exists for artifact:**
+   ```bash
+   stella evidence list --artifact <digest>
+   ```
+
+### Deep diagnosis
+
+1. **Check evidence chain completeness:**
+   ```bash
+   stella evidence chain --artifact <digest> --verbose
+   ```
+   Look for: Missing nodes in the chain
+
+2. **Check if scan completed:**
+   ```bash
+   stella scanner jobs list --artifact <digest>
+   ```
+   Problem if: No completed scan or scan failed
+
+3. **Check if attestation was created:**
+   ```bash
+   stella attest list --subject <digest>
+   ```
+   Problem if: No attestation or attestation failed
+
+4. **Check evidence store health:**
+   ```bash
+   stella evidence store health
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Generate missing SBOM:**
+   ```bash
+   stella scan image --image <image-ref> --sbom-only
+   ```
+
+2. **Generate missing attestation:**
+   ```bash
+   stella attest create --subject <digest> --type slsa-provenance
+   ```
+
+3. **Re-scan artifact to regenerate all evidence:**
+   ```bash
+   stella scan image --image <image-ref> --force
+   ```
+
+### Root cause fix
+
+**If scan never ran:**
+
+1. Check why artifact wasn't scanned:
+   ```bash
+   stella scanner queue list --artifact <digest>
+   ```
+
+2. Configure automatic scanning on push:
+   ```bash
+   stella scanner config set auto_scan.enabled true
+   stella scanner config set auto_scan.triggers "push,promote"
+   ```
+
+**If evidence was generated but not stored:**
+
+1. Check evidence store connectivity:
+   ```bash
+   stella evidence store health
+   ```
+
+2. Retry evidence storage:
+   ```bash
+   stella evidence retry-store --artifact <digest>
+   ```
+
+**If attestation signing failed:**
+
+1. Check attestor status:
+   ```bash
+   stella attest status
+   ```
+
+2. See `attestor-signing-failed.md` runbook
+
+**If evidence expired or was deleted:**
+
+1. Check evidence retention policy:
+   ```bash
+   stella evidence policy show
+   ```
+
+2. Regenerate evidence:
+   ```bash
+   stella scan image --image <image-ref> --force
+   stella attest create --subject <digest> --type slsa-provenance
+   ```
+
+### Verification
+
+```bash
+# Check all evidence now exists
+stella evidence list --artifact <digest>
+
+# Verify evidence chain is complete
+stella evidence chain --artifact <digest>
+
+# Retry promotion
+stella promotion retry <promotion-id>
+
+# Verify promotion proceeds
+stella promotion status <promotion-id>
+```
+
+---
+
+## Prevention
+
+- [ ] **Auto-scan:** Enable automatic scanning for all pushed images
+- [ ] **Gates:** Configure evidence requirements clearly in promotion policy
+- [ ] **Monitoring:** Alert on evidence generation failures
+- [ ] **Retention:** Set appropriate evidence retention periods
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/evidence-locker/architecture.md`
+- **Related runbooks:** `orchestrator-promotion-stuck.md`, `attestor-signing-failed.md`
+- **Evidence requirements:** `docs/operations/evidence-requirements.md`
--- a/docs/operations/runbooks/orchestrator-gate-timeout.md
+++ b/docs/operations/runbooks/orchestrator-gate-timeout.md
@@ -0,0 +1,178 @@
+# Runbook: Release Orchestrator - Gate Evaluation Timeout
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-004 - Release Orchestrator Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Release Orchestrator |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.orchestrator.gate-timeout` |
+
+---
+
+## Symptoms
+
+- [ ] Promotion gates timing out before completing evaluation
+- [ ] Alert `OrchestratorGateTimeout` firing
+- [ ] Error: "gate evaluation timeout exceeded"
+- [ ] Promotion stuck waiting for gate response
+- [ ] Metric `orchestrator_gate_timeout_total` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Promotions delayed or blocked; release pipeline stalled |
+| **Data integrity** | No data loss; promotion can be retried |
+| **SLA impact** | Release SLO violated if timeout persists |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.orchestrator.gate-timeout
+   ```
+
+2. **Identify timed-out gates:**
+   ```bash
+   stella promotion gates <promotion-id> --status timeout
+   ```
+
+3. **Check gate service health:**
+   ```bash
+   stella orch gate-services status
+   ```
+
+### Deep diagnosis
+
+1. **Check specific gate latency:**
+   ```bash
+   stella orch gate stats --gate <gate-name> --last 1h
+   ```
+   Look for: P95 latency, timeout rate
+
+2. **Check external service connectivity:**
+   ```bash
+   stella orch connectivity --gate <gate-name>
+   ```
+
+3. **Check gate evaluation logs:**
+   ```bash
+   stella orch logs --gate <gate-name> --promotion <promotion-id>
+   ```
+   Look for: Slow queries, external API delays
+
+4. **Check policy engine latency (for policy gates):**
+   ```bash
+   stella policy stats --last 10m
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Increase timeout for specific gate:**
+   ```bash
+   stella orch config set gates.<gate-name>.timeout 5m
+   stella orch reload
+   ```
+
+2. **Skip the timed-out gate (requires approval):**
+   ```bash
+   stella promotion gate skip <promotion-id> <gate-name> \
+     --reason "External service timeout - approved by <approver>"
+   ```
+
+3. **Retry the promotion:**
+   ```bash
+   stella promotion retry <promotion-id>
+   ```
+
+### Root cause fix
+
+**If external service is slow:**
+
+1. Configure gate retry with backoff:
+   ```bash
+   stella orch config set gates.<gate-name>.retries 3
+   stella orch config set gates.<gate-name>.retry_backoff 5s
+   ```
+
+2. Enable gate result caching:
+   ```bash
+   stella orch config set gates.<gate-name>.cache_ttl 5m
+   ```
+
+3. Configure circuit breaker:
+   ```bash
+   stella orch config set gates.<gate-name>.circuit_breaker.enabled true
+   stella orch config set gates.<gate-name>.circuit_breaker.threshold 5
+   ```
+
+**If policy evaluation is slow:**
+
+1. Optimize policy (see `policy-evaluation-slow.md` runbook)
+
+2. Increase policy worker count:
+   ```bash
+   stella policy config set opa.workers 4
+   ```
+
+**If evidence retrieval is slow:**
+
+1. Enable evidence pre-fetching:
+   ```bash
+   stella orch config set gates.evidence_prefetch true
+   ```
+
+2. Increase evidence cache:
+   ```bash
+   stella orch config set evidence.cache_size 1000
+   stella orch config set evidence.cache_ttl 10m
+   ```
+
+### Verification
+
+```bash
+# Retry promotion
+stella promotion retry <promotion-id>
+
+# Monitor gate evaluation
+stella promotion gates <promotion-id> --watch
+
+# Check gate latency improved
+stella orch gate stats --gate <gate-name> --last 10m
+
+# Verify no timeouts
+stella orch logs --filter "timeout" --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Timeouts:** Set appropriate timeouts based on gate SLAs (default: 2m)
+- [ ] **Monitoring:** Alert on gate P95 latency > 1m
+- [ ] **Caching:** Enable caching for slow gates
+- [ ] **Circuit breakers:** Enable circuit breakers for external service gates
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/release-orchestrator/gates.md`
+- **Related runbooks:** `orchestrator-promotion-stuck.md`, `policy-evaluation-slow.md`
+- **Dashboard:** Grafana > Stella Ops > Gate Latency
--- a/docs/operations/runbooks/orchestrator-promotion-stuck.md
+++ b/docs/operations/runbooks/orchestrator-promotion-stuck.md
@@ -0,0 +1,168 @@
+# Runbook: Release Orchestrator - Promotion Job Not Progressing
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-004 - Release Orchestrator Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Release Orchestrator |
+| **Severity** | Critical |
+| **On-call scope** | Platform team, Release team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.orchestrator.job-health` |
+
+---
+
+## Symptoms
+
+- [ ] Promotion job stuck in "in_progress" state for >10 minutes
+- [ ] No progress updates in promotion timeline
+- [ ] Alert `OrchestratorPromotionStuck` firing
+- [ ] UI shows promotion spinner indefinitely
+- [ ] Downstream environment not receiving promoted artifact
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Release blocked, cannot promote to target environment |
+| **Data integrity** | Artifact is safe; promotion can be retried |
+| **SLA impact** | Release SLO violated if not resolved within 30 minutes |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.orchestrator.job-health
+   ```
+
+2. **Check promotion status:**
+   ```bash
+   stella promotion status <promotion-id>
+   ```
+   Look for: Current step, last update time, any error messages
+
+3. **Check orchestrator service:**
+   ```bash
+   stella orch status
+   ```
+
+### Deep diagnosis
+
+1. **Get detailed promotion trace:**
+   ```bash
+   stella promotion trace <promotion-id> --verbose
+   ```
+   Look for: Which step is stuck, any timeouts
+
+2. **Check gate evaluation status:**
+   ```bash
+   stella promotion gates <promotion-id>
+   ```
+   Problem if: Gate stuck waiting for external service
+
+3. **Check target environment connectivity:**
+   ```bash
+   stella orch connectivity --target <env-name>
+   ```
+
+4. **Check for lock contention:**
+   ```bash
+   stella orch locks list
+   ```
+   Problem if: Stale locks on the artifact or environment
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **If gate is stuck waiting for external service:**
+   ```bash
+   # Skip the stuck gate (requires approval)
+   stella promotion gate skip <promotion-id> <gate-name> --reason "External service timeout"
+   ```
+
+2. **If lock is stale:**
+   ```bash
+   # Release the lock (use with caution)
+   stella orch locks release <lock-id> --force
+   ```
+
+3. **If orchestrator is unresponsive:**
+   ```bash
+   stella service restart orchestrator
+   ```
+
+### Root cause fix
+
+**If external gate service is slow:**
+
+1. Increase gate timeout:
+   ```bash
+   stella orch config set gates.<gate-name>.timeout 5m
+   ```
+
+2. Configure gate retry:
+   ```bash
+   stella orch config set gates.<gate-name>.retries 3
+   ```
+
+**If target environment is unreachable:**
+
+1. Check network connectivity to target
+2. Verify credentials for target environment:
+   ```bash
+   stella orch credentials verify --target <env-name>
+   ```
+
+**If database lock contention:**
+
+1. Increase lock timeout:
+   ```bash
+   stella orch config set locks.timeout 60s
+   ```
+
+2. Enable optimistic locking:
+   ```bash
+   stella orch config set locks.mode optimistic
+   ```
+
+### Verification
+
+```bash
+# Check promotion completed
+stella promotion status <promotion-id>
+
+# Verify artifact in target environment
+stella orch artifacts list --env <target-env> --filter <artifact-digest>
+
+# Check no stuck promotions
+stella promotion list --status in_progress --older-than 5m
+```
+
+---
+
+## Prevention
+
+- [ ] **Timeouts:** Configure appropriate timeouts for all gates
+- [ ] **Monitoring:** Alert on promotions stuck > 10 minutes
+- [ ] **Health checks:** Enable connectivity pre-checks before promotion
+- [ ] **Documentation:** Document SLAs for external gate services
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/release-orchestrator/architecture.md`
+- **Related runbooks:** `orchestrator-gate-timeout.md`, `orchestrator-evidence-missing.md`
+- **Dashboard:** Grafana > Stella Ops > Release Orchestrator
--- a/docs/operations/runbooks/orchestrator-quota-exceeded.md
+++ b/docs/operations/runbooks/orchestrator-quota-exceeded.md
@@ -0,0 +1,189 @@
+# Runbook: Release Orchestrator - Promotion Quota Exhausted
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-004 - Release Orchestrator Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Release Orchestrator |
+| **Severity** | Medium |
+| **On-call scope** | Platform team, Release team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.orchestrator.quota-status` |
+
+---
+
+## Symptoms
+
+- [ ] Promotions failing with "quota exceeded"
+- [ ] Alert `OrchestratorQuotaExceeded` firing
+- [ ] Error: "promotion rate limit reached" or "daily quota exhausted"
+- [ ] New promotions being rejected
+- [ ] Queued promotions not processing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | New releases blocked until quota resets or increases |
+| **Data integrity** | No data loss; promotions queued for later |
+| **SLA impact** | Release frequency SLO may be violated |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.orchestrator.quota-status
+   ```
+
+2. **Check current quota usage:**
+   ```bash
+   stella orch quota status
+   ```
+
+3. **Check quota limits:**
+   ```bash
+   stella orch quota limits show
+   ```
+
+### Deep diagnosis
+
+1. **Check promotion history:**
+   ```bash
+   stella promotion list --last 24h --count
+   ```
+   Look for: Unusual spike in promotions
+
+2. **Check per-environment quotas:**
+   ```bash
+   stella orch quota status --by-environment
+   ```
+
+3. **Check for runaway automation:**
+   ```bash
+   stella promotion list --last 1h --by-actor
+   ```
+   Problem if: Single actor/service making many promotions
+
+4. **Check when quota resets:**
+   ```bash
+   stella orch quota reset-time
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Request temporary quota increase:**
+   ```bash
+   stella orch quota request-increase --amount 50 --reason "Release deadline"
+   ```
+
+2. **Prioritize critical promotions:**
+   ```bash
+   stella promotion priority set <promotion-id> high
+   ```
+
+3. **Cancel unnecessary queued promotions:**
+   ```bash
+   stella promotion list --status queued
+   stella promotion cancel <promotion-id>
+   ```
+
+### Root cause fix
+
+**If legitimate high volume:**
+
+1. Increase quota limits:
+   ```bash
+   stella orch quota limits set --daily 200 --hourly 50
+   ```
+
+2. Increase per-environment limits:
+   ```bash
+   stella orch quota limits set --env production --daily 50
+   ```
+
+**If runaway automation:**
+
+1. Identify the source:
+   ```bash
+   stella promotion list --last 1h --by-actor --verbose
+   ```
+
+2. Revoke or rate-limit the service account:
+   ```bash
+   stella auth rate-limit set <service-account> --promotions-per-hour 10
+   ```
+
+3. Fix the automation bug
+
+**If promotion retries causing spike:**
+
+1. Check for failing promotions causing retries:
+   ```bash
+   stella promotion list --status failed --last 24h
+   ```
+
+2. Fix underlying promotion failures (see other runbooks)
+
+3. Configure retry limits:
+   ```bash
+   stella orch config set promotion.max_retries 3
+   stella orch config set promotion.retry_backoff 5m
+   ```
+
+**If quota too restrictive for workload:**
+
+1. Analyze actual promotion patterns:
+   ```bash
+   stella orch quota analyze --last 30d
+   ```
+
+2. Adjust quotas based on analysis:
+   ```bash
+   stella orch quota limits set --daily <recommended>
+   ```
+
+### Verification
+
+```bash
+# Check quota status
+stella orch quota status
+
+# Verify promotions processing
+stella promotion list --status in_progress
+
+# Test new promotion
+stella promotion create --test --dry-run
+
+# Check no quota errors
+stella orch logs --filter "quota" --level error --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Monitoring:** Alert at 80% quota usage
+- [ ] **Limits:** Set appropriate quotas based on team size and release frequency
+- [ ] **Automation:** Implement rate limiting in CI/CD pipelines
+- [ ] **Review:** Regularly review and adjust quotas based on usage patterns
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/release-orchestrator/quotas.md`
+- **Related runbooks:** `orchestrator-promotion-stuck.md`
+- **Quota management:** `docs/operations/quota-management.md`
--- a/docs/operations/runbooks/orchestrator-rollback-failed.md
+++ b/docs/operations/runbooks/orchestrator-rollback-failed.md
@@ -0,0 +1,189 @@
+# Runbook: Release Orchestrator - Rollback Operation Failed
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-004 - Release Orchestrator Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Release Orchestrator |
+| **Severity** | Critical |
+| **On-call scope** | Platform team, Release team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.orchestrator.rollback-health` |
+
+---
+
+## Symptoms
+
+- [ ] Rollback operation failing or stuck
+- [ ] Alert `OrchestratorRollbackFailed` firing
+- [ ] Error: "rollback failed" or "cannot restore previous version"
+- [ ] Target environment in inconsistent state
+- [ ] Previous artifact not available for deployment
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Rollback blocked; potentially broken release in production |
+| **Data integrity** | Environment may be in partial rollback state |
+| **SLA impact** | Incident resolution blocked; extended outage |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.orchestrator.rollback-health
+   ```
+
+2. **Check rollback status:**
+   ```bash
+   stella rollback status <rollback-id>
+   ```
+
+3. **Check previous deployment history:**
+   ```bash
+   stella orch deployments list --env <env-name> --last 10
+   ```
+
+### Deep diagnosis
+
+1. **Check why rollback failed:**
+   ```bash
+   stella rollback trace <rollback-id> --verbose
+   ```
+   Look for: Which step failed, error message
+
+2. **Check previous artifact availability:**
+   ```bash
+   stella orch artifacts get <previous-digest> --check
+   ```
+   Problem if: Artifact deleted, not in registry
+
+3. **Check environment state:**
+   ```bash
+   stella orch env status <env-name> --detailed
+   ```
+
+4. **Check for deployment locks:**
+   ```bash
+   stella orch locks list --env <env-name>
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Force release lock if stuck:**
+   ```bash
+   stella orch locks release --env <env-name> --force
+   ```
+
+2. **Manual rollback using specific artifact:**
+   ```bash
+   stella deploy --env <env-name> --artifact <previous-digest> --force
+   ```
+
+3. **If artifact unavailable, deploy last known good:**
+   ```bash
+   stella orch deployments list --env <env-name> --status success
+   stella deploy --env <env-name> --artifact <last-good-digest>
+   ```
+
+### Root cause fix
+
+**If previous artifact not in registry:**
+
+1. Check artifact retention policy:
+   ```bash
+   stella registry retention show
+   ```
+
+2. Restore from backup registry:
+   ```bash
+   stella registry restore --artifact <digest> --from backup
+   ```
+
+3. Increase artifact retention:
+   ```bash
+   stella registry retention set --min-versions 10
+   ```
+
+**If deployment service unavailable:**
+
+1. Check deployment target connectivity:
+   ```bash
+   stella orch connectivity --target <env-name>
+   ```
+
+2. Check deployment agent status:
+   ```bash
+   stella orch agent status --env <env-name>
+   ```
+
+**If configuration drift:**
+
+1. Check environment configuration:
+   ```bash
+   stella orch env config diff <env-name>
+   ```
+
+2. Reset environment to known state:
+   ```bash
+   stella orch env reset <env-name> --to-baseline
+   ```
+
+**If database state inconsistent:**
+
+1. Check orchestrator database:
+   ```bash
+   stella orch db verify
+   ```
+
+2. Repair deployment state:
+   ```bash
+   stella orch repair --deployment <deployment-id>
+   ```
+
+### Verification
+
+```bash
+# Verify rollback completed
+stella rollback status <rollback-id>
+
+# Verify environment state
+stella orch env status <env-name>
+
+# Verify correct version deployed
+stella orch deployments current --env <env-name>
+
+# Health check the environment
+stella orch health-check --env <env-name>
+```
+
+---
+
+## Prevention
+
+- [ ] **Retention:** Maintain at least 5 previous versions in registry
+- [ ] **Testing:** Test rollback procedure in staging regularly
+- [ ] **Monitoring:** Alert on rollback failures immediately
+- [ ] **Documentation:** Document manual rollback procedures per environment
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/release-orchestrator/rollback.md`
+- **Related runbooks:** `orchestrator-promotion-stuck.md`, `orchestrator-evidence-missing.md`
+- **Rollback procedures:** `docs/operations/rollback-procedures.md`
--- a/docs/operations/runbooks/policy-compilation-failed.md
+++ b/docs/operations/runbooks/policy-compilation-failed.md
@@ -0,0 +1,189 @@
+# Runbook: Policy Engine - Rego Compilation Errors
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-003 - Policy Engine Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Policy Engine |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.policy.compilation-health` |
+
+---
+
+## Symptoms
+
+- [ ] Policy deployment failing with "compilation error"
+- [ ] Alert `PolicyCompilationFailed` firing
+- [ ] Error: "rego_parse_error" or "rego_type_error"
+- [ ] New policies not taking effect
+- [ ] OPA rejecting policy bundle
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | New policies cannot be deployed; using stale policies |
+| **Data integrity** | Existing policies continue to work; new rules not enforced |
+| **SLA impact** | Policy updates blocked; security posture may be outdated |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.policy.compilation-health
+   ```
+
+2. **Check policy compilation status:**
+   ```bash
+   stella policy status --compilation
+   ```
+
+3. **Validate specific policy:**
+   ```bash
+   stella policy validate --file <policy-file>
+   ```
+
+### Deep diagnosis
+
+1. **Get detailed compilation errors:**
+   ```bash
+   stella policy compile --verbose
+   ```
+   Look for: Line numbers, error types, undefined references
+
+2. **Check for syntax errors:**
+   ```bash
+   stella policy lint --file <policy-file>
+   ```
+
+3. **Check for type errors:**
+   ```bash
+   stella policy typecheck --file <policy-file>
+   ```
+
+4. **Check OPA version compatibility:**
+   ```bash
+   stella policy opa version
+   stella policy check-compat --file <policy-file>
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Rollback to last working policy:**
+   ```bash
+   stella policy rollback --to-last-good
+   ```
+
+2. **Disable the failing policy:**
+   ```bash
+   stella policy disable <policy-id>
+   stella policy reload
+   ```
+
+3. **Use previous bundle:**
+   ```bash
+   stella policy bundle load --version <previous-version>
+   ```
+
+### Root cause fix
+
+**If syntax error:**
+
+1. Get exact error location:
+   ```bash
+   stella policy validate --file <policy-file> --show-line
+   ```
+
+2. Common syntax issues:
+   - Missing brackets or braces
+   - Invalid rule head syntax
+   - Incorrect import statements
+
+3. Fix and re-validate:
+   ```bash
+   stella policy validate --file <fixed-policy.rego>
+   ```
+
+**If undefined reference:**
+
+1. Check for missing imports:
+   ```bash
+   stella policy analyze --file <policy-file> --show-imports
+   ```
+
+2. Verify data references exist:
+   ```bash
+   stella policy data show
+   ```
+
+3. Add missing imports or data definitions
+
+**If type error:**
+
+1. Check type mismatches:
+   ```bash
+   stella policy typecheck --file <policy-file> --verbose
+   ```
+
+2. Common type issues:
+   - Comparing incompatible types
+   - Invalid function arguments
+   - Missing type annotations
+
+**If OPA version incompatibility:**
+
+1. Check Rego version features used:
+   ```bash
+   stella policy analyze --file <policy-file> --show-features
+   ```
+
+2. Update policy to use compatible features or upgrade OPA
+
+### Verification
+
+```bash
+# Validate fixed policy
+stella policy validate --file <fixed-policy.rego>
+
+# Test policy compilation
+stella policy compile --file <fixed-policy.rego>
+
+# Deploy policy
+stella policy deploy --file <fixed-policy.rego>
+
+# Test policy evaluation
+stella policy evaluate --test
+```
+
+---
+
+## Prevention
+
+- [ ] **CI/CD:** Add policy validation to CI pipeline before deployment
+- [ ] **Linting:** Run `stella policy lint` on all policy changes
+- [ ] **Testing:** Write unit tests for policies with `stella policy test`
+- [ ] **Staging:** Deploy to staging environment before production
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/policy/architecture.md`
+- **Related runbooks:** `policy-opa-crash.md`, `policy-evaluation-slow.md`
+- **Rego reference:** https://www.openpolicyagent.org/docs/latest/policy-language/
+- **Policy testing:** `docs/modules/policy/testing.md`
--- a/docs/operations/runbooks/policy-evaluation-slow.md
+++ b/docs/operations/runbooks/policy-evaluation-slow.md
@@ -0,0 +1,174 @@
+# Runbook: Policy Engine - Evaluation Latency High
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-003 - Policy Engine Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Policy Engine |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.policy.evaluation-latency` |
+
+---
+
+## Symptoms
+
+- [ ] Policy evaluation takes >500ms (warning) or >2s (critical)
+- [ ] Gate decisions timing out in CI/CD pipelines
+- [ ] Alert `PolicyEvaluationSlow` firing
+- [ ] Metric `policy_evaluation_duration_seconds` P95 > 1s
+- [ ] Users report "policy check taking too long"
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Slow release gate checks, CI/CD pipeline delays |
+| **Data integrity** | No data loss; decisions are still correct |
+| **SLA impact** | Gate latency SLO violated (target: P95 < 500ms) |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.policy.evaluation-latency
+   ```
+
+2. **Check policy engine status:**
+   ```bash
+   stella policy status
+   ```
+
+3. **Check recent evaluation times:**
+   ```bash
+   stella policy stats --last 10m
+   ```
+   Look for: P95 latency, cache hit rate
+
+### Deep diagnosis
+
+1. **Profile a slow evaluation:**
+   ```bash
+   stella policy evaluate --image <image-ref> --profile
+   ```
+   Look for: Which phase is slowest (parse, compile, execute)
+
+2. **Check OPA compilation cache:**
+   ```bash
+   stella policy cache stats
+   ```
+   Problem if: Cache hit rate < 90%
+
+3. **Check policy complexity:**
+   ```bash
+   stella policy analyze --complexity
+   ```
+   Problem if: Cyclomatic complexity > 50 or rule count > 200
+
+4. **Check external data fetches:**
+   ```bash
+   stella policy logs --filter "external fetch" --level debug
+   ```
+   Problem if: Many external fetches or slow responses
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Clear and warm the compilation cache:**
+   ```bash
+   stella policy cache clear
+   stella policy cache warm
+   ```
+
+2. **Increase OPA worker count:**
+   ```bash
+   stella policy config set opa.workers 4
+   stella policy reload
+   ```
+
+3. **Enable evaluation result caching:**
+   ```bash
+   stella policy config set cache.evaluation_ttl 60s
+   stella policy reload
+   ```
+
+### Root cause fix
+
+**If policy is too complex:**
+
+1. Analyze and simplify policy:
+   ```bash
+   stella policy analyze --suggest-optimizations
+   ```
+
+2. Split large policies into modules:
+   ```bash
+   stella policy refactor --auto-split
+   ```
+
+**If external data fetches are slow:**
+
+1. Increase external data cache TTL:
+   ```bash
+   stella policy config set external_data.cache_ttl 5m
+   ```
+
+2. Pre-fetch external data:
+   ```bash
+   stella policy external-data prefetch
+   ```
+
+**If Rego compilation is slow:**
+
+1. Enable partial evaluation:
+   ```bash
+   stella policy config set opa.partial_eval true
+   ```
+
+2. Pre-compile policies:
+   ```bash
+   stella policy compile --all
+   ```
+
+### Verification
+
+```bash
+# Run evaluation and check latency
+stella policy evaluate --image <image-ref> --timing
+
+# Check P95 latency
+stella policy stats --last 5m
+
+# Verify cache is effective
+stella policy cache stats
+```
+
+---
+
+## Prevention
+
+- [ ] **Review:** Review policy complexity before deployment
+- [ ] **Monitoring:** Alert on P95 latency > 300ms
+- [ ] **Caching:** Ensure evaluation cache is enabled
+- [ ] **Pre-warming:** Add cache warming to deployment pipeline
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/policy/architecture.md`
+- **Related runbooks:** `policy-opa-crash.md`, `policy-compilation-failed.md`
+- **Dashboard:** Grafana > Stella Ops > Policy Engine
--- a/docs/operations/runbooks/policy-opa-crash.md
+++ b/docs/operations/runbooks/policy-opa-crash.md
@@ -0,0 +1,205 @@
+# Runbook: Policy Engine - OPA Process Crashed
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-003 - Policy Engine Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Policy Engine |
+| **Severity** | Critical |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.policy.opa-health` |
+
+---
+
+## Symptoms
+
+- [ ] Policy evaluations failing with "OPA unavailable" error
+- [ ] Alert `PolicyOPACrashed` firing
+- [ ] OPA process exited unexpectedly
+- [ ] Error: "connection refused" when connecting to OPA
+- [ ] Metric `policy_opa_restarts_total` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | All policy evaluations fail; gate decisions blocked |
+| **Data integrity** | No data loss; decisions delayed until OPA recovers |
+| **SLA impact** | Gate latency SLO violated; release pipeline blocked |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.policy.opa-health
+   ```
+
+2. **Check OPA process status:**
+   ```bash
+   stella policy status
+   ```
+   Look for: OPA process state, restart count
+
+3. **Check OPA logs for crash reason:**
+   ```bash
+   stella policy opa logs --last 30m --level error
+   ```
+
+### Deep diagnosis
+
+1. **Check OPA memory usage before crash:**
+   ```bash
+   stella policy stats --opa-metrics
+   ```
+   Problem if: Memory usage near limit before crash
+
+2. **Check for problematic policy:**
+   ```bash
+   stella policy list --last-error
+   ```
+   Look for: Policies that caused evaluation errors
+
+3. **Check OPA configuration:**
+   ```bash
+   stella policy opa config show
+   ```
+   Look for: Invalid configuration, missing bundles
+
+4. **Check for infinite loops in Rego:**
+   ```bash
+   stella policy analyze --detect-loops
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Restart OPA process:**
+   ```bash
+   stella policy opa restart
+   ```
+
+2. **If OPA keeps crashing, start in safe mode:**
+   ```bash
+   stella policy opa start --safe-mode
+   ```
+   Note: Safe mode disables custom policies
+
+3. **Enable failopen temporarily (if allowed by policy):**
+   ```bash
+   stella policy config set failopen true
+   stella policy reload
+   ```
+   **Warning:** Only use if compliance allows fail-open mode
+
+### Root cause fix
+
+**If OOM killed:**
+
+1. Increase OPA memory limit:
+   ```bash
+   stella policy opa config set memory_limit 2Gi
+   stella policy opa restart
+   ```
+
+2. Enable garbage collection tuning:
+   ```bash
+   stella policy opa config set gc_min_heap_size 256Mi
+   stella policy opa config set gc_max_heap_size 1Gi
+   ```
+
+**If policy caused crash:**
+
+1. Identify problematic policy:
+   ```bash
+   stella policy list --status error
+   ```
+
+2. Disable the problematic policy:
+   ```bash
+   stella policy disable <policy-id>
+   stella policy reload
+   ```
+
+3. Fix and re-enable:
+   ```bash
+   stella policy validate --file <fixed-policy.rego>
+   stella policy update <policy-id> --file <fixed-policy.rego>
+   stella policy enable <policy-id>
+   ```
+
+**If bundle loading failed:**
+
+1. Check bundle integrity:
+   ```bash
+   stella policy bundle verify
+   ```
+
+2. Rebuild bundle:
+   ```bash
+   stella policy bundle build --output bundle.tar.gz
+   stella policy bundle load bundle.tar.gz
+   ```
+
+**If configuration issue:**
+
+1. Reset to default configuration:
+   ```bash
+   stella policy opa config reset
+   ```
+
+2. Reconfigure with validated settings:
+   ```bash
+   stella policy opa config set workers 4
+   stella policy opa config set decision_log true
+   stella policy opa restart
+   ```
+
+### Verification
+
+```bash
+# Check OPA is running
+stella policy status
+
+# Check OPA health
+stella policy opa health
+
+# Test policy evaluation
+stella policy evaluate --test
+
+# Check no crashes in recent logs
+stella policy opa logs --level error --last 30m
+
+# Monitor stability
+stella policy stats --watch
+```
+
+---
+
+## Prevention
+
+- [ ] **Resources:** Set appropriate memory limits based on policy complexity
+- [ ] **Validation:** Validate all policies before deployment
+- [ ] **Monitoring:** Alert on OPA restart count > 2 in 10 minutes
+- [ ] **Testing:** Load test policies before production deployment
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/policy/architecture.md`
+- **Related runbooks:** `policy-evaluation-slow.md`, `policy-compilation-failed.md`
+- **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Policy/`
+- **OPA documentation:** https://www.openpolicyagent.org/docs/latest/
--- a/docs/operations/runbooks/policy-storage-unavailable.md
+++ b/docs/operations/runbooks/policy-storage-unavailable.md
@@ -0,0 +1,178 @@
+# Runbook: Policy Engine - Policy Storage Backend Down
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-003 - Policy Engine Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Policy Engine |
+| **Severity** | Critical |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.policy.storage-health` |
+
+---
+
+## Symptoms
+
+- [ ] Policy operations failing with "storage unavailable"
+- [ ] Alert `PolicyStorageUnavailable` firing
+- [ ] Error: "failed to connect to policy store" or "database connection refused"
+- [ ] Policy updates not persisting
+- [ ] OPA unable to load bundles from storage
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Policy updates fail; cached policies may still work |
+| **Data integrity** | Policy changes not persisted; risk of inconsistent state |
+| **SLA impact** | Policy management blocked; evaluations use cached data |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.policy.storage-health
+   ```
+
+2. **Check storage connectivity:**
+   ```bash
+   stella policy storage status
+   ```
+
+3. **Check database health:**
+   ```bash
+   stella db status --component policy
+   ```
+
+### Deep diagnosis
+
+1. **Check PostgreSQL connectivity:**
+   ```bash
+   stella db ping --database policy
+   ```
+
+2. **Check connection pool status:**
+   ```bash
+   stella db pool-status --database policy
+   ```
+   Problem if: Pool exhausted, connections timing out
+
+3. **Check storage logs:**
+   ```bash
+   stella policy logs --filter "storage" --level error --last 30m
+   ```
+
+4. **Check disk space (if local storage):**
+   ```bash
+   stella policy storage disk-usage
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Enable read-only mode (use cached policies):**
+   ```bash
+   stella policy config set storage.read_only true
+   stella policy reload
+   ```
+
+2. **Switch to backup storage:**
+   ```bash
+   stella policy storage failover --to backup
+   ```
+
+3. **Restart policy service to reconnect:**
+   ```bash
+   stella service restart policy-engine
+   ```
+
+### Root cause fix
+
+**If database connection issue:**
+
+1. Check database status:
+   ```bash
+   stella db status --database policy --verbose
+   ```
+
+2. Restart database connection pool:
+   ```bash
+   stella db pool-restart --database policy
+   ```
+
+3. Check and increase connection limits:
+   ```bash
+   stella db config set policy.max_connections 50
+   ```
+
+**If disk space exhausted:**
+
+1. Check storage usage:
+   ```bash
+   stella policy storage disk-usage --verbose
+   ```
+
+2. Clean old policy versions:
+   ```bash
+   stella policy versions cleanup --older-than 30d
+   ```
+
+3. Increase storage capacity
+
+**If storage corruption:**
+
+1. Verify storage integrity:
+   ```bash
+   stella policy storage verify
+   ```
+
+2. Restore from backup:
+   ```bash
+   stella policy storage restore --from-backup latest
+   ```
+
+### Verification
+
+```bash
+# Check storage status
+stella policy storage status
+
+# Test write operation
+stella policy storage test-write
+
+# Test policy update
+stella policy update --test
+
+# Verify no errors
+stella policy logs --filter "storage" --level error --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Monitoring:** Alert on storage connection failures immediately
+- [ ] **Redundancy:** Configure backup storage for failover
+- [ ] **Cleanup:** Schedule regular cleanup of old policy versions
+- [ ] **Capacity:** Monitor disk usage and plan for growth
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/policy/storage.md`
+- **Related runbooks:** `policy-opa-crash.md`, `postgres-ops.md`
+- **Database setup:** `docs/operations/database-configuration.md`
--- a/docs/operations/runbooks/policy-version-mismatch.md
+++ b/docs/operations/runbooks/policy-version-mismatch.md
@@ -0,0 +1,195 @@
+# Runbook: Policy Engine - Policy Version Conflicts
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-003 - Policy Engine Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Policy Engine |
+| **Severity** | Medium |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.policy.version-consistency` |
+
+---
+
+## Symptoms
+
+- [ ] Policy evaluation returning unexpected results
+- [ ] Alert `PolicyVersionMismatch` firing
+- [ ] Error: "policy version conflict" or "bundle version mismatch"
+- [ ] Different nodes evaluating with different policy versions
+- [ ] Inconsistent gate decisions for same artifact
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Inconsistent policy decisions; unpredictable gate results |
+| **Data integrity** | Decisions may not match expected policy behavior |
+| **SLA impact** | Gate accuracy SLO violated; trust in decisions reduced |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.policy.version-consistency
+   ```
+
+2. **Check policy version across nodes:**
+   ```bash
+   stella policy version --all-nodes
+   ```
+
+3. **Check active policy version:**
+   ```bash
+   stella policy active --show-version
+   ```
+
+### Deep diagnosis
+
+1. **Compare versions across instances:**
+   ```bash
+   stella policy version diff --all-instances
+   ```
+   Problem if: Different versions on different nodes
+
+2. **Check bundle distribution status:**
+   ```bash
+   stella policy bundle status --all-nodes
+   ```
+
+3. **Check for failed deployments:**
+   ```bash
+   stella policy deployments list --status failed --last 24h
+   ```
+
+4. **Check OPA bundle sync:**
+   ```bash
+   stella policy opa bundle-status
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Force sync to latest version:**
+   ```bash
+   stella policy sync --force --all-nodes
+   ```
+
+2. **Pin specific version:**
+   ```bash
+   stella policy pin --version <version>
+   stella policy sync --all-nodes
+   ```
+
+3. **Restart policy engines to force reload:**
+   ```bash
+   stella service restart policy-engine --all-nodes
+   ```
+
+### Root cause fix
+
+**If bundle distribution failed:**
+
+1. Check bundle storage:
+   ```bash
+   stella policy bundle storage-status
+   ```
+
+2. Rebuild and redistribute bundle:
+   ```bash
+   stella policy bundle build
+   stella policy bundle distribute --all-nodes
+   ```
+
+**If node out of sync:**
+
+1. Check specific node status:
+   ```bash
+   stella policy status --node <node-id>
+   ```
+
+2. Force node resync:
+   ```bash
+   stella policy sync --node <node-id> --force
+   ```
+
+3. Verify node is receiving updates:
+   ```bash
+   stella policy bundle check-subscription --node <node-id>
+   ```
+
+**If concurrent deployments caused conflict:**
+
+1. Check deployment history:
+   ```bash
+   stella policy deployments list --last 1h
+   ```
+
+2. Resolve to single version:
+   ```bash
+   stella policy resolve-conflict --to-version <version>
+   ```
+
+3. Enable deployment locking:
+   ```bash
+   stella policy config set deployment.locking true
+   ```
+
+**If OPA bundle polling issue:**
+
+1. Check OPA bundle configuration:
+   ```bash
+   stella policy opa config show | grep bundle
+   ```
+
+2. Decrease polling interval for faster sync:
+   ```bash
+   stella policy opa config set bundle.polling.min_delay_seconds 10
+   stella policy opa config set bundle.polling.max_delay_seconds 30
+   ```
+
+### Verification
+
+```bash
+# Verify all nodes on same version
+stella policy version --all-nodes
+
+# Test consistent evaluation
+stella policy evaluate --test --all-nodes
+
+# Verify bundle status
+stella policy bundle status --all-nodes
+
+# Check no version warnings
+stella policy logs --filter "version" --level warning --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Locking:** Enable deployment locking to prevent concurrent updates
+- [ ] **Monitoring:** Alert on version drift between nodes
+- [ ] **Sync:** Configure aggressive bundle polling for fast convergence
+- [ ] **Testing:** Deploy to staging before production to catch issues
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/policy/versioning.md`
+- **Related runbooks:** `policy-opa-crash.md`, `policy-storage-unavailable.md`
+- **Deployment guide:** `docs/operations/policy-deployment.md`
--- a/docs/operations/runbooks/postgres-ops.md
+++ b/docs/operations/runbooks/postgres-ops.md
@@ -0,0 +1,371 @@
+# Sprint: SPRINT_20260117_029_Runbook_coverage_expansion
+# Task: RUN-001 - PostgreSQL Operations Runbook
+# PostgreSQL Database Runbook (dev-mock ready)
+
+Status: PRODUCTION-READY (2026-01-17 UTC)
+
+## Scope
+PostgreSQL database operations including monitoring, maintenance, backup/restore, and common incident handling for Stella Ops deployments.
+
+---
+
+## Pre-flight Checklist
+
+### Environment Verification
+```bash
+# Check database connection
+stella db ping
+
+# Verify connection pool health
+stella doctor --check check.postgres.connectivity,check.postgres.pool
+
+# Check migration status
+stella db migrations status
+```
+
+### Metrics to Watch
+- `stella_postgres_connections_active` - Active connections (should be < 80% of max)
+- `stella_postgres_query_duration_seconds` - P99 query latency (target: < 100ms)
+- `stella_postgres_pool_waiting` - Connections waiting for pool (should be 0)
+
+---
+
+## Standard Procedures
+
+### SP-001: Daily Health Check
+
+**Frequency:** Daily or on-demand
+**Duration:** ~5 minutes
+
+1. Run comprehensive health check:
+   ```bash
+   stella doctor --category database --format json > /tmp/db-health-$(date +%Y%m%d).json
+   ```
+
+2. Review slow queries from last 24h:
+   ```bash
+   stella db queries --slow --period 24h --limit 20
+   ```
+
+3. Check replication status (if applicable):
+   ```bash
+   stella db replication status
+   ```
+
+4. Verify backup completion:
+   ```bash
+   stella backup status --type database
+   ```
+
+### SP-002: Connection Pool Tuning
+
+**When:** Pool exhaustion alerts or high wait times
+
+1. Check current pool usage:
+   ```bash
+   stella db pool stats --detailed
+   ```
+
+2. Identify connection-holding queries:
+   ```bash
+   stella db queries --active --sort duration
+   ```
+
+3. Adjust pool size (if needed):
+   ```bash
+   # Review current settings
+   stella config get Database:MaxPoolSize
+   
+   # Increase pool size
+   stella config set Database:MaxPoolSize 150
+   
+   # Restart affected services
+   stella service restart --service release-orchestrator
+   ```
+
+4. Verify improvement:
+   ```bash
+   stella db pool watch --duration 5m
+   ```
+
+### SP-003: Backup and Restore
+
+**Backup:**
+```bash
+# Create immediate backup
+stella backup create --type database --name "pre-upgrade-$(date +%Y%m%d)"
+
+# Verify backup
+stella backup verify --latest
+```
+
+**Restore:**
+```bash
+# List available backups
+stella backup list --type database
+
+# Restore to specific point (CAUTION: destructive)
+stella backup restore --id <backup-id> --confirm
+
+# Verify restoration
+stella db ping
+stella db migrations status
+```
+
+### SP-004: Migration Execution
+
+1. Pre-migration backup:
+   ```bash
+   stella backup create --type database --name "pre-migration"
+   ```
+
+2. Run migrations:
+   ```bash
+   # Dry run first
+   stella db migrate --dry-run
+   
+   # Apply migrations
+   stella db migrate
+   ```
+
+3. Verify migration success:
+   ```bash
+   stella db migrations status
+   stella doctor --check check.postgres.migrations
+   ```
+
+---
+
+## Incident Procedures
+
+### INC-001: Connection Pool Exhaustion
+
+**Symptoms:**
+- Alert: `StellaPostgresPoolExhausted`
+- Error logs: "connection pool exhausted, waiting for available connection"
+- Increased request latency
+
+**Investigation:**
+```bash
+# Check pool status
+stella db pool stats
+
+# Find long-running queries
+stella db queries --active --sort duration --limit 10
+
+# Check for connection leaks
+stella db connections --by-client
+```
+
+**Resolution:**
+
+1. **Immediate relief** - Terminate long-running queries:
+   ```bash
+   # Identify stuck queries
+   stella db queries --active --duration ">5m"
+   
+   # Terminate specific query (use with caution)
+   stella db query terminate --pid <pid>
+   ```
+
+2. **Scale pool** (if legitimate load):
+   ```bash
+   stella config set Database:MaxPoolSize 200
+   stella service restart --graceful
+   ```
+
+3. **Fix leaks** (if application bug):
+   - Review application logs for unclosed connections
+   - Deploy fix to affected service
+
+### INC-002: Slow Query Performance
+
+**Symptoms:**
+- Alert: `StellaPostgresQueryLatencyHigh`
+- P99 query latency > 500ms
+
+**Investigation:**
+```bash
+# Get slow query report
+stella db queries --slow --period 1h --format json > /tmp/slow-queries.json
+
+# Analyze specific query
+stella db query explain --sql "SELECT ..." --analyze
+
+# Check table statistics
+stella db stats tables --sort bloat
+```
+
+**Resolution:**
+
+1. **Index optimization:**
+   ```bash
+   # Get index recommendations
+   stella db index suggest --table <table>
+   
+   # Create recommended index
+   stella db index create --table <table> --columns "col1,col2"
+   ```
+
+2. **Vacuum/analyze:**
+   ```bash
+   stella db vacuum --table <table>
+   stella db analyze --table <table>
+   ```
+
+3. **Query optimization** - Review and rewrite problematic queries
+
+### INC-003: Database Connectivity Loss
+
+**Symptoms:**
+- Alert: `StellaPostgresConnectionFailed`
+- All services reporting database connection errors
+
+**Investigation:**
+```bash
+# Test basic connectivity
+stella db ping
+
+# Check DNS resolution
+stella network dns-lookup <db-host>
+
+# Check firewall/network
+stella network test --host <db-host> --port 5432
+```
+
+**Resolution:**
+
+1. **Network issue:**
+   - Verify security groups / firewall rules
+   - Check VPN/tunnel status if applicable
+   - Verify DNS resolution
+
+2. **Database server issue:**
+   - Check PostgreSQL service status on server
+   - Review PostgreSQL logs
+   - Check disk space on database server
+
+3. **Credential issue:**
+   ```bash
+   stella db verify-credentials
+   stella secrets rotate --scope database
+   ```
+
+### INC-004: Disk Space Alert
+
+**Symptoms:**
+- Alert: `StellaPostgresDiskSpaceWarning` or `Critical`
+- Database write failures
+
+**Investigation:**
+```bash
+# Check disk usage
+stella db disk-usage
+
+# Find large tables
+stella db stats tables --sort size --limit 20
+
+# Check for bloat
+stella db stats tables --sort bloat
+```
+
+**Resolution:**
+
+1. **Immediate cleanup:**
+   ```bash
+   # Vacuum to reclaim space
+   stella db vacuum --full --table <large-table>
+   
+   # Clean old data (if retention policy allows)
+   stella db prune --table evidence_artifacts --older-than 90d --dry-run
+   ```
+
+2. **Archive old data:**
+   ```bash
+   stella db archive --table findings_history --older-than 180d
+   ```
+
+3. **Expand disk** (if legitimate growth):
+   - Follow cloud provider procedure to expand volume
+   - Resize filesystem
+
+---
+
+## Maintenance Windows
+
+### Weekly Maintenance (Sunday 02:00 UTC)
+
+1. Run vacuum analyze on all tables:
+   ```bash
+   stella db vacuum --analyze --all-tables
+   ```
+
+2. Update table statistics:
+   ```bash
+   stella db analyze --all-tables
+   ```
+
+3. Clean temporary files:
+   ```bash
+   stella db cleanup --temp-files
+   ```
+
+### Monthly Maintenance (First Sunday 03:00 UTC)
+
+1. Full vacuum on large tables:
+   ```bash
+   stella db vacuum --full --table findings --table verdicts
+   ```
+
+2. Reindex if needed:
+   ```bash
+   stella db reindex --concurrently --table findings
+   ```
+
+3. Archive old data per retention policy:
+   ```bash
+   stella db archive --apply-retention
+   ```
+
+---
+
+## Monitoring Dashboard
+
+Access: Grafana → Dashboards → Stella Ops → PostgreSQL
+
+Key panels:
+- Connection pool utilization
+- Query latency percentiles
+- Disk usage trend
+- Replication lag (if applicable)
+- Active queries count
+
+---
+
+## Evidence Capture
+
+For any incident, capture:
+```bash
+# Comprehensive database state
+stella db diagnostics --output /tmp/db-diag-$(date +%Y%m%dT%H%M%S).tar.gz
+```
+
+Bundle includes:
+- Connection stats
+- Active queries
+- Lock information
+- Table statistics
+- Recent slow query log
+- Configuration snapshot
+
+---
+
+## Escalation Path
+
+1. **L1 (On-call):** Standard procedures, restart services
+2. **L2 (Database team):** Query optimization, schema changes
+3. **L3 (Vendor support):** Hardware/cloud platform issues
+
+---
+
+_Last updated: 2026-01-17 (UTC)_
--- a/docs/operations/runbooks/scanner-oom.md
+++ b/docs/operations/runbooks/scanner-oom.md
@@ -0,0 +1,152 @@
+# Runbook: Scanner - Out of Memory on Large Images
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-002 - Scanner Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Scanner |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.scanner.memory-usage` |
+
+---
+
+## Symptoms
+
+- [ ] Scanner worker exits with code 137 (OOM killed)
+- [ ] Scans fail consistently for specific large images
+- [ ] Error log contains "fatal error: runtime: out of memory"
+- [ ] Alert `ScannerWorkerOOM` firing
+- [ ] Metric `scanner_worker_restarts_total{reason="oom"}` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Large images cannot be scanned; smaller images may still work |
+| **Data integrity** | No data loss; failed scans can be retried |
+| **SLA impact** | Specific images blocked from release pipeline |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Identify the failing image:**
+   ```bash
+   stella scanner jobs list --status failed --last 1h
+   ```
+
+2. **Check image size:**
+   ```bash
+   stella image inspect <image-ref> --format json | jq '.size'
+   ```
+   Problem if: Image size > 2GB or layer count > 100
+
+3. **Check worker memory limit:**
+   ```bash
+   stella scanner config get worker.memory_limit
+   ```
+
+### Deep diagnosis
+
+1. **Profile memory usage during scan:**
+   ```bash
+   stella scan image --image <image-ref> --profile-memory
+   ```
+
+2. **Check SBOM generation memory:**
+   ```bash
+   stella scanner logs --filter "sbom" --level debug --last 30m
+   ```
+   Look for: "memory allocation failed", "heap exhausted"
+
+3. **Identify memory-heavy layers:**
+   ```bash
+   stella image layers <image-ref> --sort-by size
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Increase worker memory limit:**
+   ```bash
+   stella scanner config set worker.memory_limit 8Gi
+   stella scanner workers restart
+   ```
+
+2. **Enable streaming mode for large images:**
+   ```bash
+   stella scanner config set sbom.streaming_threshold 1Gi
+   stella scanner workers restart
+   ```
+
+3. **Retry the failed scan:**
+   ```bash
+   stella scan image --image <image-ref> --retry
+   ```
+
+### Root cause fix
+
+**For consistently large images:**
+
+1. Configure dedicated large-image worker pool:
+   ```bash
+   stella scanner workers add --pool large-images --memory 16Gi --count 2
+   stella scanner config set routing.large_image_threshold 2Gi
+   stella scanner config set routing.large_image_pool large-images
+   ```
+
+**For images with many small files (node_modules, etc.):**
+
+1. Enable incremental SBOM mode:
+   ```bash
+   stella scanner config set sbom.incremental_mode true
+   ```
+
+**For base image reuse:**
+
+1. Enable layer caching:
+   ```bash
+   stella scanner config set cache.layer_dedup true
+   ```
+
+### Verification
+
+```bash
+# Retry the previously failing scan
+stella scan image --image <image-ref>
+
+# Monitor memory during scan
+stella scanner workers stats --watch
+
+# Verify no OOM in recent logs
+stella scanner logs --filter "out of memory" --last 1h
+```
+
+---
+
+## Prevention
+
+- [ ] **Capacity:** Set memory limit based on largest expected image (recommend 4Gi minimum)
+- [ ] **Routing:** Configure large-image pool for images > 2GB
+- [ ] **Monitoring:** Alert on `scanner_worker_memory_usage_bytes` > 80% of limit
+- [ ] **Documentation:** Document image size limits in user guide
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/scanner/architecture.md`
+- **Related runbooks:** `scanner-worker-stuck.md`, `scanner-timeout.md`
+- **Dashboard:** Grafana > Stella Ops > Scanner Memory
--- a/docs/operations/runbooks/scanner-registry-auth.md
+++ b/docs/operations/runbooks/scanner-registry-auth.md
@@ -0,0 +1,195 @@
+# Runbook: Scanner - Registry Authentication Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-002 - Scanner Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Scanner |
+| **Severity** | High |
+| **On-call scope** | Platform team, Security team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.scanner.registry-auth` |
+
+---
+
+## Symptoms
+
+- [ ] Scans failing with "401 Unauthorized" or "403 Forbidden"
+- [ ] Alert `ScannerRegistryAuthFailed` firing
+- [ ] Error: "failed to authenticate with registry"
+- [ ] Error: "failed to pull image manifest"
+- [ ] Scans work for public images but fail for private images
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Cannot scan private images; release pipeline blocked |
+| **Data integrity** | No data loss; authentication issue only |
+| **SLA impact** | All scans for affected registry blocked |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.scanner.registry-auth
+   ```
+
+2. **List configured registries:**
+   ```bash
+   stella registry list --show-status
+   ```
+   Look for: Registries with "auth_failed" status
+
+3. **Test registry authentication:**
+   ```bash
+   stella registry test <registry-url>
+   ```
+
+### Deep diagnosis
+
+1. **Check credential expiration:**
+   ```bash
+   stella registry credentials show <registry-name>
+   ```
+   Look for: Expiration date, token type
+
+2. **Test with verbose output:**
+   ```bash
+   stella registry test <registry-url> --verbose
+   ```
+   Look for: Specific auth error message, HTTP status code
+
+3. **Check registry logs:**
+   ```bash
+   stella scanner logs --filter "registry auth" --last 30m
+   ```
+
+4. **Verify IAM/OIDC configuration (for cloud registries):**
+   ```bash
+   stella registry iam-status <registry-name>
+   ```
+   Problem if: IAM role not assumable, OIDC token expired
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Refresh credentials (for token-based auth):**
+   ```bash
+   stella registry refresh-credentials <registry-name>
+   ```
+
+2. **Update static credentials:**
+   ```bash
+   stella registry update-credentials <registry-name> \
+     --username <user> \
+     --password <token>
+   ```
+
+3. **For Docker Hub rate limiting:**
+   ```bash
+   stella registry configure docker-hub \
+     --username <user> \
+     --access-token <token>
+   ```
+
+### Root cause fix
+
+**If credentials expired:**
+
+1. Generate new access token in registry (ECR, GCR, ACR, etc.)
+
+2. Update credentials:
+   ```bash
+   stella registry update-credentials <registry-name> --from-env
+   ```
+
+3. Configure automatic token refresh:
+   ```bash
+   stella registry config set <registry-name>.auto_refresh true
+   stella registry config set <registry-name>.refresh_interval 11h
+   ```
+
+**If IAM role/policy changed (AWS ECR):**
+
+1. Verify IAM role permissions:
+   ```bash
+   stella registry iam verify <registry-name>
+   ```
+
+2. Update IAM role ARN if changed:
+   ```bash
+   stella registry configure ecr \
+     --region <region> \
+     --role-arn <arn>
+   ```
+
+**If OIDC federation changed (GCP Artifact Registry):**
+
+1. Verify service account:
+   ```bash
+   stella registry oidc verify <registry-name>
+   ```
+
+2. Update workload identity configuration:
+   ```bash
+   stella registry configure gcr \
+     --project <project> \
+     --workload-identity-provider <provider>
+   ```
+
+**If certificate changed (self-hosted registries):**
+
+1. Update CA certificate:
+   ```bash
+   stella registry configure <registry-name> \
+     --ca-cert /path/to/ca.crt
+   ```
+
+2. Or skip verification (not recommended for production):
+   ```bash
+   stella registry configure <registry-name> \
+     --insecure-skip-verify
+   ```
+
+### Verification
+
+```bash
+# Test authentication
+stella registry test <registry-url>
+
+# Test scanning a private image
+stella scan image --image <registry-url>/<image>:<tag> --dry-run
+
+# Verify no auth failures in recent logs
+stella scanner logs --filter "auth" --level error --last 30m
+```
+
+---
+
+## Prevention
+
+- [ ] **Credentials:** Use service accounts/workload identity instead of static tokens
+- [ ] **Rotation:** Configure automatic token refresh before expiration
+- [ ] **Monitoring:** Alert on authentication failure rate > 0
+- [ ] **Documentation:** Document registry credential management procedures
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/scanner/registry-auth.md`
+- **Related runbooks:** `scanner-worker-stuck.md`, `scanner-timeout.md`
+- **Registry setup:** `docs/operations/registry-configuration.md`
--- a/docs/operations/runbooks/scanner-sbom-generation-failed.md
+++ b/docs/operations/runbooks/scanner-sbom-generation-failed.md
@@ -0,0 +1,188 @@
+# Runbook: Scanner - SBOM Generation Failures
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-002 - Scanner Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Scanner |
+| **Severity** | High |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.scanner.sbom-generation` |
+
+---
+
+## Symptoms
+
+- [ ] Scans completing but SBOM generation failing
+- [ ] Alert `ScannerSbomGenerationFailed` firing
+- [ ] Error: "SBOM generation failed" or "unsupported package format"
+- [ ] Partial SBOM with missing components
+- [ ] Metric `scanner_sbom_generation_failures_total` increasing
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Incomplete vulnerability coverage; missing dependencies not scanned |
+| **Data integrity** | Partial SBOM may miss vulnerabilities; attestations incomplete |
+| **SLA impact** | SBOM completeness SLO violated (target: > 95%) |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.scanner.sbom-generation
+   ```
+
+2. **Check failed SBOM jobs:**
+   ```bash
+   stella scanner jobs list --status sbom_failed --last 1h
+   ```
+
+3. **Check SBOM completeness rate:**
+   ```bash
+   stella scanner stats --sbom-metrics
+   ```
+
+### Deep diagnosis
+
+1. **Analyze specific failure:**
+   ```bash
+   stella scanner job details <job-id> --sbom-errors
+   ```
+   Look for: Specific package manager or file type causing failure
+
+2. **Check for unsupported ecosystems:**
+   ```bash
+   stella sbom analyze --image <image-ref> --verbose
+   ```
+   Look for: "unsupported", "unknown package format", "parsing failed"
+
+3. **Check scanner plugin status:**
+   ```bash
+   stella scanner plugins list --status
+   ```
+   Problem if: Package manager plugin disabled or erroring
+
+4. **Check for corrupted package files:**
+   ```bash
+   stella image inspect <image-ref> --check-integrity
+   ```
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Enable fallback SBOM generation:**
+   ```bash
+   stella scanner config set sbom.fallback_mode true
+   stella scan image --image <image-ref> --sbom-fallback
+   ```
+
+2. **Use alternative SBOM generator:**
+   ```bash
+   stella sbom generate --image <image-ref> --generator syft --output sbom.json
+   ```
+
+3. **Generate partial SBOM and continue:**
+   ```bash
+   stella scan image --image <image-ref> --sbom-partial-ok
+   ```
+
+### Root cause fix
+
+**If package manager not supported:**
+
+1. Check supported package managers:
+   ```bash
+   stella scanner plugins list --type package-manager
+   ```
+
+2. Enable additional plugins:
+   ```bash
+   stella scanner plugins enable <plugin-name>
+   ```
+
+3. For custom package formats, add mapping:
+   ```bash
+   stella scanner config set sbom.custom_mappings.<format> <handler>
+   ```
+
+**If package file corrupted:**
+
+1. Identify corrupted files:
+   ```bash
+   stella image layers <image-ref> --verify-packages
+   ```
+
+2. Report to image owner for fix
+
+**If memory/resource issue during generation:**
+
+1. Increase SBOM generator resources:
+   ```bash
+   stella scanner config set sbom.memory_limit 4Gi
+   stella scanner config set sbom.timeout 10m
+   ```
+
+2. Enable streaming mode:
+   ```bash
+   stella scanner config set sbom.streaming_mode true
+   ```
+
+**If plugin crashed:**
+
+1. Check plugin logs:
+   ```bash
+   stella scanner plugins logs <plugin-name> --last 30m
+   ```
+
+2. Restart plugin:
+   ```bash
+   stella scanner plugins restart <plugin-name>
+   ```
+
+### Verification
+
+```bash
+# Retry SBOM generation
+stella sbom generate --image <image-ref> --output sbom.json
+
+# Validate SBOM completeness
+stella sbom validate --file sbom.json --check-completeness
+
+# Check component count
+stella sbom stats --file sbom.json
+
+# Full scan with SBOM
+stella scan image --image <image-ref>
+```
+
+---
+
+## Prevention
+
+- [ ] **Plugins:** Keep all package manager plugins enabled and updated
+- [ ] **Monitoring:** Alert on SBOM completeness < 90%
+- [ ] **Fallback:** Configure fallback SBOM generator for resilience
+- [ ] **Testing:** Test SBOM generation for new image types before production
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/scanner/sbom-generation.md`
+- **Related runbooks:** `scanner-oom.md`, `scanner-timeout.md`
+- **SBOM formats:** `docs/formats/sbom-spdx.md`, `docs/formats/sbom-cyclonedx.md`
--- a/docs/operations/runbooks/scanner-timeout.md
+++ b/docs/operations/runbooks/scanner-timeout.md
@@ -0,0 +1,174 @@
+# Runbook: Scanner - Scan Timeout on Complex Images
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-002 - Scanner Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Scanner |
+| **Severity** | Medium |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.scanner.timeout-rate` |
+
+---
+
+## Symptoms
+
+- [ ] Scans failing with "timeout exceeded" error
+- [ ] Alert `ScannerTimeoutExceeded` firing
+- [ ] Metric `scanner_scan_timeout_total` increasing
+- [ ] Specific images consistently timing out
+- [ ] Error log: "scan operation exceeded timeout of X seconds"
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | Specific images cannot be scanned; pipeline blocked |
+| **Data integrity** | No data loss; scans can be retried with adjusted settings |
+| **SLA impact** | Release pipeline delayed for affected images |
+
+---
+
+## Diagnosis
+
+### Quick checks
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.scanner.timeout-rate
+   ```
+
+2. **Identify failing images:**
+   ```bash
+   stella scanner jobs list --status timeout --last 1h
+   ```
+   Look for: Pattern in image types or sizes
+
+3. **Check current timeout settings:**
+   ```bash
+   stella scanner config get timeouts
+   ```
+
+### Deep diagnosis
+
+1. **Analyze image complexity:**
+   ```bash
+   stella image inspect <image-ref> --format json | jq '{size, layers: .layers | length, files: .manifest.fileCount}'
+   ```
+   Problem if: > 50 layers, > 100k files, or > 5GB size
+
+2. **Check scanner worker load:**
+   ```bash
+   stella scanner workers stats
+   ```
+   Problem if: All workers at capacity during timeouts
+
+3. **Profile a scan:**
+   ```bash
+   stella scan image --image <image-ref> --profile --verbose
+   ```
+   Look for: Which phase is slowest (layer extraction, SBOM generation, vuln matching)
+
+4. **Check for filesystem-heavy images:**
+   ```bash
+   stella image layers <image-ref> --sort-by file-count
+   ```
+   Problem if: Single layer with > 50k files (e.g., node_modules)
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Increase timeout for specific image:**
+   ```bash
+   stella scan image --image <image-ref> --timeout 30m
+   ```
+
+2. **Increase global scan timeout:**
+   ```bash
+   stella scanner config set timeouts.scan 20m
+   stella scanner workers restart
+   ```
+
+3. **Enable fast mode for initial scan:**
+   ```bash
+   stella scan image --image <image-ref> --fast-mode
+   ```
+
+### Root cause fix
+
+**If image is too complex:**
+
+1. Enable incremental scanning:
+   ```bash
+   stella scanner config set scan.incremental_mode true
+   ```
+
+2. Configure layer caching:
+   ```bash
+   stella scanner config set cache.layer_dedup true
+   stella scanner config set cache.sbom_cache true
+   ```
+
+**If filesystem is too large:**
+
+1. Enable streaming SBOM generation:
+   ```bash
+   stella scanner config set sbom.streaming_threshold 500Gi
+   ```
+
+2. Configure file sampling for massive images:
+   ```bash
+   stella scanner config set sbom.file_sample_max 100000
+   ```
+
+**If vulnerability matching is slow:**
+
+1. Enable parallel matching:
+   ```bash
+   stella scanner config set vuln.parallel_matching true
+   stella scanner config set vuln.match_workers 4
+   ```
+
+2. Optimize vulnerability database indexes:
+   ```bash
+   stella db optimize --component scanner
+   ```
+
+### Verification
+
+```bash
+# Retry the previously failing scan
+stella scan image --image <image-ref> --timeout 30m
+
+# Monitor scan progress
+stella scanner jobs watch <job-id>
+
+# Verify no timeouts in recent scans
+stella scanner jobs list --status timeout --last 1h
+```
+
+---
+
+## Prevention
+
+- [ ] **Capacity:** Configure appropriate timeouts based on expected image complexity (15m default, 30m for large)
+- [ ] **Monitoring:** Alert on timeout rate > 5%
+- [ ] **Caching:** Enable layer and SBOM caching for base images
+- [ ] **Documentation:** Document image size/complexity limits in user guide
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/scanner/architecture.md`
+- **Related runbooks:** `scanner-oom.md`, `scanner-worker-stuck.md`
+- **Dashboard:** Grafana > Stella Ops > Scanner Performance
--- a/docs/operations/runbooks/scanner-worker-stuck.md
+++ b/docs/operations/runbooks/scanner-worker-stuck.md
@@ -0,0 +1,174 @@
+# Runbook: Scanner - Worker Not Processing Jobs
+
+> **Sprint:** SPRINT_20260117_029_DOCS_runbook_coverage
+> **Task:** RUN-002 - Scanner Runbooks
+
+## Metadata
+
+| Field | Value |
+|-------|-------|
+| **Component** | Scanner |
+| **Severity** | Critical |
+| **On-call scope** | Platform team |
+| **Last updated** | 2026-01-17 |
+| **Doctor check** | `check.scanner.worker-health` |
+
+---
+
+## Symptoms
+
+- [ ] Scan jobs stuck in "pending" or "processing" state for >5 minutes
+- [ ] Scanner worker process shows 0% CPU usage
+- [ ] Alert `ScannerWorkerStuck` or `ScannerQueueBacklog` firing
+- [ ] UI shows "Scan in progress" indefinitely
+- [ ] Metric `scanner_jobs_pending` increasing over time
+
+---
+
+## Impact
+
+| Impact Type | Description |
+|-------------|-------------|
+| **User-facing** | New scans cannot complete, blocking CI/CD pipelines and release gates |
+| **Data integrity** | No data loss; pending jobs will resume when worker recovers |
+| **SLA impact** | Scan latency SLO violated if not resolved within 15 minutes |
+
+---
+
+## Diagnosis
+
+### Quick checks (< 2 minutes)
+
+1. **Check Doctor diagnostics:**
+   ```bash
+   stella doctor --check check.scanner.worker-health
+   ```
+
+2. **Check scanner service status:**
+   ```bash
+   stella scanner status
+   ```
+   Expected: "Scanner workers: 4 active, 0 idle"
+   Problem: "Scanner workers: 0 active" or "status: degraded"
+
+3. **Check job queue depth:**
+   ```bash
+   stella scanner queue status
+   ```
+   Expected: Queue depth < 50
+   Problem: Queue depth > 100 or growing rapidly
+
+### Deep diagnosis
+
+1. **Check worker process logs:**
+   ```bash
+   stella scanner logs --tail 100 --level error
+   ```
+   Look for: "timeout", "connection refused", "out of memory"
+
+2. **Check Valkey connectivity (job queue):**
+   ```bash
+   stella doctor --check check.storage.valkey
+   ```
+
+3. **Check if workers are OOM-killed:**
+   ```bash
+   stella scanner workers inspect
+   ```
+   Look for: "exit_code: 137" (OOM) or "exit_code: 143" (SIGTERM)
+
+4. **Check resource utilization:**
+   ```bash
+   stella obs metrics --filter scanner --last 10m
+   ```
+   Look for: Memory > 90%, CPU sustained > 95%
+
+---
+
+## Resolution
+
+### Immediate mitigation
+
+1. **Restart scanner workers:**
+   ```bash
+   stella scanner workers restart
+   ```
+   This will: Terminate current workers and spawn fresh ones
+
+2. **If restart fails, force restart the scanner service:**
+   ```bash
+   stella service restart scanner
+   ```
+
+3. **Verify workers are processing:**
+   ```bash
+   stella scanner queue status --watch
+   ```
+   Queue depth should start decreasing
+
+### Root cause fix
+
+**If workers were OOM-killed:**
+
+1. Increase worker memory limit:
+   ```bash
+   stella scanner config set worker.memory_limit 4Gi
+   stella scanner workers restart
+   ```
+
+2. Reduce concurrent scans per worker:
+   ```bash
+   stella scanner config set worker.concurrency 2
+   stella scanner workers restart
+   ```
+
+**If Valkey connection failed:**
+
+1. Check Valkey health:
+   ```bash
+   stella doctor --check check.storage.valkey
+   ```
+
+2. Restart Valkey if needed (see `valkey-connection-failure.md`)
+
+**If workers are deadlocked:**
+
+1. Enable deadlock detection:
+   ```bash
+   stella scanner config set worker.deadlock_detection true
+   stella scanner workers restart
+   ```
+
+### Verification
+
+```bash
+# Verify workers are healthy
+stella doctor --check check.scanner.worker-health
+
+# Submit a test scan
+stella scan image --image alpine:latest --dry-run
+
+# Watch queue drain
+stella scanner queue status --watch
+
+# Verify no errors in recent logs
+stella scanner logs --tail 20 --level error
+```
+
+---
+
+## Prevention
+
+- [ ] **Alert:** Ensure `ScannerQueueBacklog` alert is configured with threshold < 100 jobs
+- [ ] **Monitoring:** Add Grafana panel for worker memory usage
+- [ ] **Capacity:** Review worker count and memory limits during capacity planning
+- [ ] **Deadlock:** Enable `worker.deadlock_detection` in production
+
+---
+
+## Related Resources
+
+- **Architecture:** `docs/modules/scanner/architecture.md`
+- **Related runbooks:** `scanner-oom.md`, `scanner-timeout.md`
+- **Doctor check:** `src/Doctor/__Plugins/StellaOps.Doctor.Plugin.Scanner/Checks/WorkerHealthCheck.cs`
+- **Dashboard:** Grafana > Stella Ops > Scanner Overview
--- a/docs/product/advisories/17-Jan-2026
+++ b/docs/product/advisories/17-Jan-2026
@@ -1,202 +0,0 @@
-# Product Advisory: AI Economics Moat
-ID: ADVISORY-20260116-AI-ECON-MOAT
-Status: ACTIVE
-Owner intent: Product-wide directive
-Scope: All modules, docs, sprints, and roadmap decisions
-
-## 0) Thesis (why this advisory exists)
-
-In AI economics, code is cheap, software is expensive.
-
-Competitors (and future competitors) can produce large volumes of code quickly. Stella Ops must remain hard to catch by focusing on the parts that are still expensive:
- trust
- operability
- determinism
- evidence integrity
- low-touch onboarding
- low support burden at scale
-
-This advisory defines the product-level objectives and non-negotiable standards that make Stella Ops defensible against "code producers".
-
-## 1) Product positioning (the class we must win)
-
-Stella Ops Suite must be "best in class" for:
-
-Evidence-grade release orchestration for containerized applications outside Kubernetes.
-
-Stella is NOT attempting to be:
- a generic CD platform (Octopus, GitLab, Jenkins replacements)
- a generic vulnerability scanner (Trivy, Grype replacements)
- a "platform of everything" with infinite integrations
-
-The moat is the end-to-end chain:
-digest identity -> evidence -> verdict -> gate -> promotion -> audit export -> deterministic replay
-
-The product wins when customers can run verified releases with minimal human labor and produce auditor-ready evidence.
-
-## 2) Target customer and adoption constraint
-
-Constraint: founder operates solo until ~100 paying customers.
-
-Therefore, the product must be self-serve by default:
- install must be predictable
- failures must be diagnosable without maintainer time
- docs must replace support
- "Doctor" must replace debugging sessions
-
-Support must be an exception, not a workflow.
-
-## 3) The five non-negotiable product invariants
-
-Every meaningful product change MUST preserve and strengthen these invariants:
-
-I1. Evidence-grade by design
- Every verified decision has an evidence trail.
- Evidence is exportable, replayable, and verifiable.
-
-I2. Deterministic replay
- Same inputs -> same outputs.
- A verdict can be reproduced and verified later, not just explained.
-
-I3. Digest-first identity
- Releases are immutable digests, not mutable tags.
- "What is deployed where" is anchored to digests.
-
-I4. Offline-first posture
- Air-gapped and low-egress environments must remain first-class.
- No hidden network dependencies in core flows.
-
-I5. Low-touch operability
- Misconfigurations fail fast at startup with clear messages.
- Runtime failures have deterministic recovery playbooks.
- Doctor provides actionable diagnostics bundles and remediation steps.
-
-If a proposed feature weakens any invariant, it must be rejected or redesigned.
-
-## 4) Moats we build (how Stella stays hard to catch)
-
-M1. Evidence chain continuity (no "glue work" required)
- Scan results, reachability proofs, policy evaluation, approvals, promotions, and exports are one continuous chain.
- Do not require customers to stitch multiple tools together to get audit-grade releases.
-
-M2. Explainability with proof, not narrative
- "Why blocked?" must produce a deterministic trace + referenced evidence artifacts.
- The answer must be replayable, not a one-time explanation.
-
-M3. Operability moat (Doctor + safe defaults)
- Diagnostics must identify root cause, not just symptoms.
- Provide deterministic checklists and fixes.
- Every integration must ship with health checks and failure-mode docs.
-
-M4. Controlled surface area (reduce permutations)
- Ship a small number of Tier-1 golden integrations and targets.
- Keep the plugin system as an escape valve, but do not expand the maintained matrix beyond what solo operations can support.
-
-M5. Standards-grade outputs with stable schemas
- SBOM, VEX, attestations, exports, and decision records must be stable, versioned, and backwards compatible where promised.
- Stability is a moat: auditors and platform teams adopt what they can depend on.
-
-## 5) Explicit non-goals (what to reject quickly)
-
-Reject or de-prioritize proposals that primarily:
- add a generic CD surface without evidence and determinism improvements
- expand integrations broadly without a "Tier-1" support model and diagnostics coverage
- compete on raw scanner breadth rather than evidence-grade gating outcomes
- add UI polish that does not reduce operator labor or support load
- add "AI features" that create nondeterminism or require external calls in core paths
-
-If a feature does not strengthen at least one moat (M1-M5), it is likely not worth shipping now.
-
-## 6) Agent review rubric (use this to evaluate any proposal, advisory, or sprint)
-
-When reviewing any new idea, feature request, PRD, or sprint, score it against:
-
-A) Moat impact (required)
- Which moat does it strengthen (M1-M5)?
- What measurable operator/auditor outcome improves?
-
-B) Support burden risk (critical)
- Does this increase the probability of support tickets?
- Does Doctor cover the new failure modes?
- Are there clear runbooks and error messages?
-
-C) Determinism and evidence risk (critical)
- Does this introduce nondeterminism?
- Are outputs stable, canonical, and replayable?
- Does it weaken evidence chain integrity?
-
-D) Permutation risk (critical)
- Does this increase the matrix of supported combinations?
- Can it be constrained to a "golden path" configuration?
-
-E) Time-to-value impact (important)
- Does this reduce time to first verified release?
- Does it reduce time to answer "why blocked"?
-
-If a proposal scores poorly on B/C/D, it must be redesigned or rejected.
-
-## 7) Definition of Done (feature-level) - do not ship without the boring parts
-
-Any shippable feature must include, at minimum:
-
-DOD-1: Operator story
- Clear user story for operators and auditors, not just developers.
-
-DOD-2: Failure modes and recovery
- Documented expected failures, error codes/messages, and remediation steps.
- Doctor checks added or extended to cover the common failure paths.
-
-DOD-3: Determinism and evidence
- Deterministic outputs where applicable.
- Evidence artifacts linked to decisions.
- Replay or verify path exists if the feature affects verdicts or gates.
-
-DOD-4: Tests
- Unit tests for logic (happy + edge cases).
- Integration tests for contracts (DB, queues, storage where used).
- Determinism tests when outputs are serialized, hashed, or signed.
-
-DOD-5: Documentation
- Docs updated where the feature changes behavior or contracts.
- Include copy/paste examples for the golden path usage.
-
-DOD-6: Observability
- Structured logs and metrics for success/failure paths.
- Explicit "reason codes" for gate decisions and failures.
-
-If the feature cannot afford these, it cannot afford to exist in a solo-scaled product.
-
-## 8) Product-level metrics (what we optimize)
-
-These metrics are the scoreboard. Prioritize work that improves them.
-
-P0 metrics (most important):
- Time-to-first-verified-release (fresh install -> verified promotion)
- Mean time to answer "why blocked?" (with proof)
- Support minutes per customer per month (must trend toward near-zero)
- Determinism regressions per release (must be near-zero)
-
-P1 metrics:
- Noise reduction ratio (reachable actionable findings vs raw findings)
- Audit export acceptance rate (auditors can consume without manual reconstruction)
- Upgrade success rate (low-friction updates, predictable migrations)
-
-## 9) Immediate product focus areas implied by this advisory
-
-When unsure what to build next, prefer investments in:
- Doctor: diagnostics coverage, fix suggestions, bundles, and environment validation
- Golden path onboarding: install -> connect -> scan -> gate -> promote -> export
- Determinism gates in CI and runtime checks for canonical outputs
- Evidence export bundles that map to common audit needs
- "Why blocked" trace quality, completeness, and replay verification
-
-Avoid "breadth expansion" unless it includes full operability coverage.
-
-## 10) How to apply this advisory in planning
-
-When processing this advisory:
- Ensure docs reflect the invariants and moats at the product overview level.
- Ensure sprints and tasks reference which moat they strengthen (M1-M5).
- If a sprint increases complexity without decreasing operator labor or improving evidence integrity, treat it as suspect.
-
-Archive this advisory only if it is superseded by a newer product-wide directive.