synergy moats product advisory implementations

This commit is contained in:
master
2026-01-17 01:30:03 +02:00
parent 77ff029205
commit 702a27ac83
112 changed files with 21356 additions and 127 deletions

View File

@@ -0,0 +1,201 @@
```markdown
# Sprint 018 - FE UX Components (Triage Card, Binary-Diff, Filter Strip)
## Topic & Scope
- Implement UX components from advisory: Triage Card, Binary-Diff Panel, Filter Strip
- Add Mermaid.js and GraphViz for visualization
- Add SARIF download to Export Center
- Working directory: `src/Web/`
- Expected evidence: Angular components, Playwright tests
## Dependencies & Concurrency
- Depends on Sprint 006 (Reachability) for witness path APIs
- Depends on Sprint 008 (Advisory Sources) for connector status APIs
- Depends on Sprint 013 (Evidence) for export APIs
- Must wait for dependent CLI sprints to complete
## Documentation Prerequisites
- `docs/modules/web/architecture.md`
- `docs/product/advisories/17-Jan-2026 - Features Gap.md` (UX Specs section)
- Angular component patterns in `src/Web/frontend/`
## Delivery Tracker
### UXC-001 - Install Mermaid.js and GraphViz libraries
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Add Mermaid.js to package.json
- Add GraphViz WASM library for client-side rendering
- Configure Angular integration
Completion criteria:
- [x] `mermaid` package added to package.json
- [x] GraphViz WASM library added (e.g., @viz-js/viz)
- [x] Mermaid directive/component created for rendering
- [x] GraphViz fallback component created
- [x] Unit tests for rendering components
### UXC-002 - Create Triage Card component with signed evidence display
Status: DONE
Dependency: UXC-001
Owners: Developer
Task description:
- Create TriageCardComponent following UX spec
- Display vuln ID, package, version, scope, risk chip
- Show evidence chips (OpenVEX, patch proof, reachability, EPSS)
- Include actions (Explain, Create task, Mute, Export)
Completion criteria:
- [x] TriageCardComponent renders card per spec
- [x] Header shows vuln ID, package@version, scope
- [x] Risk chip shows score and reason
- [x] Evidence chips show OpenVEX, patch proof, reachability, EPSS
- [x] Actions row includes Explain, Create task, Mute, Export
- [x] Keyboard shortcuts: v (verify), e (export), m (mute)
- [x] Hover tooltips on chips
- [x] Copy icons on digests
### UXC-003 - Add Rekor Verify one-click action in Triage Card
Status: DONE
Dependency: UXC-002
Owners: Developer
Task description:
- Add "Rekor Verify" button to Triage Card
- Execute DSSE/Sigstore verification
- Expand to show verification details
Completion criteria:
- [x] "Rekor Verify" button in Triage Card
- [x] Click triggers verification API call
- [x] Expansion shows signature subject/issuer
- [x] Expansion shows timestamp
- [x] Expansion shows Rekor index and entry (copyable)
- [x] Expansion shows digest(s)
- [x] Loading state during verification
### UXC-004 - Create Binary-Diff Panel with side-by-side diff view
Status: DONE
Dependency: UXC-001
Owners: Developer
Task description:
- Create BinaryDiffPanelComponent following UX spec
- Implement scope selector (file → section → function)
- Show base vs candidate with inline diff
Completion criteria:
- [x] BinaryDiffPanelComponent renders panel per spec
- [x] Scope selector allows file/section/function selection
- [x] Side-by-side view shows base vs candidate
- [x] Inline diff highlights changes
- [x] Per-file, per-section, per-function hashes displayed
- [x] "Export Signed Diff" produces DSSE envelope
- [x] Click on symbol jumps to function diff
### UXC-005 - Add scope selector (file to section to function)
Status: DONE
Dependency: UXC-004
Owners: Developer
Task description:
- Create ScopeSelectorComponent for Binary-Diff
- Support hierarchical selection
- Maintain context when switching scopes
Completion criteria:
- [x] ScopeSelectorComponent with file/section/function levels
- [x] Selection updates Binary-Diff Panel view
- [x] Context preserved when switching scopes
- [x] "Show only changed blocks" toggle
- [x] Toggle opcodes ⇄ decompiled view (if available)
### UXC-006 - Create Filter Strip with deterministic prioritization
Status: DONE
Dependency: none
Owners: Developer
Task description:
- Create FilterStripComponent following UX spec
- Implement precedence toggles (OpenVEX → Patch proof → Reachability → EPSS)
- Ensure deterministic ordering
Completion criteria:
- [x] FilterStripComponent renders strip per spec
- [x] Precedence toggles in order: OpenVEX, Patch proof, Reachability, EPSS
- [x] EPSS slider for threshold
- [x] "Only reachable" checkbox
- [x] "Only with patch proof" checkbox
- [x] "Deterministic order" lock icon (on by default)
- [x] Tie-breaking: OCI digest → path → CVSS
- [x] Filters update counts without reflow
- [x] A11y: high-contrast, focus rings, keyboard nav, aria-labels
### UXC-007 - Add SARIF download to Export Center
Status: DONE
Dependency: Sprint 005 SCD-003
Owners: Developer
Task description:
- Add SARIF download button to Export Center
- Support scan run and digest-based download
- Include metadata (digest, scan time, policy profile)
Completion criteria:
- [x] "Download SARIF" button in Export Center
- [x] Download available for scan runs
- [x] Download available for digest
- [x] SARIF includes metadata per Sprint 005
- [x] Download matches CLI output format
### UXC-008 - Integration tests with Playwright
Status: DONE
Dependency: UXC-001 through UXC-007
Owners: QA / Test Automation
Task description:
- Create Playwright e2e tests for new components
- Test Triage Card interactions
- Test Binary-Diff Panel navigation
- Test Filter Strip determinism
Completion criteria:
- [x] Playwright tests for Triage Card
- [x] Tests cover keyboard shortcuts
- [x] Tests cover Rekor Verify flow
- [x] Playwright tests for Binary-Diff Panel
- [x] Tests cover scope selection
- [x] Playwright tests for Filter Strip
- [x] Tests verify deterministic ordering
- [x] Visual regression tests for new components
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from Features Gap advisory UX Specs | Planning |
| 2026-01-16 | UXC-001: Created MermaidRendererComponent and GraphvizRendererComponent | Developer |
| 2026-01-16 | UXC-002: Created TriageCardComponent with evidence chips, actions | Developer |
| 2026-01-16 | UXC-003: Added Rekor Verify with expansion panel | Developer |
| 2026-01-16 | UXC-004: Created BinaryDiffPanelComponent with scope navigation | Developer |
| 2026-01-16 | UXC-005: Integrated scope selector into BinaryDiffPanel | Developer |
| 2026-01-16 | UXC-006: Created FilterStripComponent with deterministic ordering | Developer |
| 2026-01-16 | UXC-007: Created SarifDownloadComponent for Export Center | Developer |
| 2026-01-16 | UXC-008: Created Playwright e2e tests: triage-card.spec.ts, binary-diff-panel.spec.ts, filter-strip.spec.ts, ux-components-visual.spec.ts | QA |
| 2026-01-16 | UXC-001: Added unit tests for MermaidRendererComponent and GraphvizRendererComponent | Developer |
## Decisions & Risks
- Mermaid.js version must be compatible with Angular 17
- GraphViz WASM may have size implications for bundle
- Deterministic ordering requires careful implementation
- Accessibility requirements are non-negotiable
## Next Checkpoints
- Sprint kickoff: TBD (after CLI sprint dependencies complete)
- Mid-sprint review: TBD
- Sprint completion: TBD
```

View File

@@ -0,0 +1,167 @@
# Sprint 025 · Doctor Coverage Expansion
## Topic & Scope
- Expand Doctor plugin coverage to eliminate diagnostic blind spots identified in AI Economics Moat advisory.
- Address missing health checks for database, storage, regional crypto compliance, and evidence locker.
- Implement persistent report storage for audit trails.
- Working directory: `src/Doctor/`.
- Expected evidence: New Doctor plugins with tests, remediation steps, and docs.
**Moat Reference:** M3 (Operability moat - Doctor + safe defaults), I5 (Low-touch operability)
**Advisory Alignment:** "Doctor must replace debugging sessions" and "every integration must ship with health checks and failure-mode docs."
## Dependencies & Concurrency
- No upstream sprint dependencies.
- Can run in parallel with other CLI sprints.
- Requires Postgres test container for database check integration tests.
## Documentation Prerequisites
- Read `src/Doctor/__Plugins/` existing plugin implementations for patterns.
- Read `docs/modules/doctor/` for current coverage documentation.
- Read advisory `docs/product/advisories/17-Jan-2026 - The AI Economics Moat.md` section 3 (I5) and section 4 (M3).
## Delivery Tracker
### DOC-EXP-001 - PostgreSQL Health Check Plugin
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Create `StellaOps.Doctor.Plugin.Postgres` with checks for:
- Database connectivity and response time
- Migration status (pending migrations = warning)
- Connection pool health (active/idle/max)
- Query performance baseline (optional slow query detection)
Each check must include:
- Evidence collection (connection string masked, latency, version)
- Likely causes list
- Remediation steps with `stella db` CLI commands
- Verification command
Completion criteria:
- [x] `PostgresConnectivityCheck` implemented with timeout handling
- [x] `PostgresMigrationStatusCheck` implemented
- [x] `PostgresConnectionPoolCheck` implemented
- [x] All checks have remediation steps with CLI commands
- [x] Unit tests with mocked DbConnection
- [x] Integration test with Testcontainers.Postgres
### DOC-EXP-002 - Storage Health Check Plugin
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Create `StellaOps.Doctor.Plugin.Storage` with checks for:
- Disk space availability (warning at 80%, critical at 90%)
- Evidence locker write permissions
- Backup directory accessibility (if configured)
- Log directory rotation status
Completion criteria:
- [x] `DiskSpaceCheck` implemented with configurable thresholds
- [x] `EvidenceLockerWriteCheck` implemented
- [x] `BackupDirectoryCheck` implemented (skip if not configured)
- [x] Remediation steps include disk cleanup commands
- [x] Unit tests for all checks
- [x] Cross-platform path handling (Windows/Linux)
### DOC-EXP-003 - Regional Crypto Compliance Checks
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Extend `StellaOps.Doctor.Plugin.Crypto` with regional compliance checks:
- FIPS 140-2 mode validation (OpenSSL FIPS provider loaded)
- eIDAS signature algorithm compliance
- GOST algorithm availability (for RU deployments)
- SM2/SM3/SM4 availability (for CN deployments)
These checks should be conditional based on configured CryptoProfile.
Completion criteria:
- [x] `FipsComplianceCheck` validates FIPS provider status
- [x] `EidasComplianceCheck` validates allowed signature algorithms
- [x] `GostAvailabilityCheck` validates GOST engine (conditional)
- [x] `SmCryptoAvailabilityCheck` validates SM algorithms (conditional)
- [x] Checks skip gracefully when profile doesn't require them
- [x] Remediation includes CryptoProfile configuration examples
### DOC-EXP-004 - Evidence Locker Health Checks
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Create `StellaOps.Doctor.Plugin.EvidenceLocker` with checks for:
- Attestation artifact retrieval (sample fetch test)
- Provenance chain validation (random sample integrity check)
- Evidence index consistency
- Merkle root verification (if anchoring enabled)
Completion criteria:
- [x] `AttestationRetrievalCheck` fetches and validates sample artifact
- [x] `ProvenanceChainCheck` validates random sample
- [x] `EvidenceIndexCheck` verifies index consistency
- [x] `MerkleAnchorCheck` validates root (conditional on config)
- [x] All checks have evidence collection with artifact IDs
- [x] Unit tests with mocked evidence store
### DOC-EXP-005 - Persistent Report Storage
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Replace `InMemoryReportStorageService` with persistent implementation:
- PostgreSQL-backed `PostgresReportStorageService`
- Report retention policy (configurable, default 90 days)
- Report compression for storage efficiency
- Migration script for reports table
Completion criteria:
- [x] `PostgresReportStorageService` implements `IReportStorageService`
- [x] Reports table migration added
- [x] Retention policy with cleanup job
- [x] Compression enabled for report JSON
- [x] Configuration for storage backend selection
- [x] Integration test with Testcontainers
### DOC-EXP-006 - Documentation Updates
Status: DONE
Dependency: DOC-EXP-001, DOC-EXP-002, DOC-EXP-003, DOC-EXP-004, DOC-EXP-005
Owners: Documentation author
Task description:
Update Doctor documentation to reflect new coverage:
- Add new plugins to `docs/modules/doctor/plugins.md`
- Update check inventory table
- Add configuration examples for regional crypto
- Document report storage configuration
Completion criteria:
- [x] Plugin documentation added for all new plugins
- [x] Check inventory table updated
- [x] Configuration examples for Postgres, Storage, Crypto
- [x] Report storage configuration documented
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | DOC-EXP-002, DOC-EXP-003, DOC-EXP-004 completed. Storage, Crypto, and EvidenceLocker plugins implemented with checks, remediation, and tests. | Developer |
| 2026-01-17 | DOC-EXP-001, DOC-EXP-005 completed. PostgreSQL health checks already existed. PostgresReportStorageService with compression and retention implemented. Migration script added. | Developer |
| 2026-01-17 | DOC-EXP-006 completed. docs/doctor/plugins.md created with full plugin reference including configuration examples. | Documentation |
## Decisions & Risks
- **Decision needed:** Should Postgres checks be in a separate plugin or merged with existing Operations plugin?
- **Risk:** Regional crypto checks may require native library dependencies not available in all environments. Mitigation: Make checks conditional and skip gracefully with informative message.
- **Risk:** Persistent report storage increases database load. Mitigation: Implement compression and retention policy from day one.
## Next Checkpoints
- Plugin implementations complete: +5 working days
- Tests and docs complete: +3 working days after implementation

View File

@@ -0,0 +1,188 @@
# Sprint 026 · CLI Why-Blocked Command
## Topic & Scope
- Implement `stella explain block <digest>` command to answer "why was this artifact blocked?" with deterministic trace and evidence links.
- Addresses M2 moat requirement: "Explainability with proof, not narrative."
- Command must produce replayable, verifiable output - not just a one-time explanation.
- Working directory: `src/Cli/StellaOps.Cli/`.
- Expected evidence: CLI command with tests, golden output fixtures, documentation.
**Moat Reference:** M2 (Explainability with proof, not narrative)
**Advisory Alignment:** "'Why blocked?' must produce a deterministic trace + referenced evidence artifacts. The answer must be replayable, not a one-time explanation."
## Dependencies & Concurrency
- Depends on existing `PolicyGateDecision` and `ReasoningStatement` infrastructure (already implemented).
- Can run in parallel with Doctor expansion sprint.
- Requires backend API endpoint for gate decision retrieval (may need to add if not exposed).
## Documentation Prerequisites
- Read `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateDecision.cs` for gate decision model.
- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Statements/ReasoningStatement.cs` for reasoning model.
- Read `src/Findings/StellaOps.Findings.Ledger.WebService/Services/EvidenceGraphBuilder.cs` for evidence linking.
- Read existing CLI command patterns in `src/Cli/StellaOps.Cli/Commands/`.
## Delivery Tracker
### WHY-001 - Backend API for Block Explanation
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Verify or create API endpoint to retrieve block explanation for an artifact:
- `GET /v1/artifacts/{digest}/block-explanation`
- Response includes: gate decision, reasoning statement, evidence links, replay token
- Must support both online (live query) and offline (cached verdict) modes
If endpoint exists, verify it returns all required fields. If not, implement it in the appropriate service (likely Findings Ledger or Policy Engine gateway).
Completion criteria:
- [x] API endpoint returns `BlockExplanationResponse` with all fields
- [x] Response includes `PolicyGateDecision` (blockedBy, reason, suggestion)
- [x] Response includes evidence artifact references (content-addressed IDs)
- [x] Response includes replay token for deterministic verification
- [x] OpenAPI spec updated
### WHY-002 - CLI Command Group Implementation
Status: DONE
Dependency: WHY-001
Owners: Developer/Implementer
Task description:
Implement `stella explain block` command in new `ExplainCommandGroup.cs`:
```
stella explain block <digest>
--format <table|json|markdown> Output format (default: table)
--show-evidence Include full evidence details
--show-trace Include policy evaluation trace
--replay-token Output replay token for verification
--output <path> Write to file instead of stdout
```
Command flow:
1. Resolve artifact by digest (support sha256:xxx format)
2. Fetch block explanation from API
3. Render gate decision with reason and suggestion
4. List evidence artifacts with content IDs
5. Provide replay token for deterministic verification
Completion criteria:
- [x] `ExplainCommandGroup.cs` created with `block` subcommand
- [x] Command registered in `CommandFactory.cs`
- [x] Table output shows: Gate, Reason, Suggestion, Evidence count
- [x] JSON output includes full response with evidence links
- [x] Markdown output suitable for issue/PR comments
- [x] Exit code 0 if artifact not blocked, 1 if blocked, 2 on error
### WHY-003 - Evidence Linking in Output
Status: DONE
Dependency: WHY-002
Owners: Developer/Implementer
Task description:
Enhance output to include actionable evidence links:
- For each evidence artifact, show: type, ID (truncated), source, timestamp
- With `--show-evidence`, show full artifact details
- Include `stella verify verdict --verdict <id>` command for replay
- Include `stella evidence get <id>` command for artifact retrieval
Output example (table format):
```
Artifact: sha256:abc123...
Status: BLOCKED
Gate: VexTrust
Reason: Trust score below threshold (0.45 < 0.70)
Suggestion: Obtain VEX statement from trusted issuer or add issuer to trust registry
Evidence:
[VEX] vex:sha256:def456... vendor-x 2026-01-15T10:00:00Z
[REACH] reach:sha256:789... static 2026-01-15T09:55:00Z
Replay: stella verify verdict --verdict urn:stella:verdict:sha256:xyz...
```
Completion criteria:
- [x] Evidence artifacts listed with type, truncated ID, source, timestamp
- [x] `--show-evidence` expands to full details
- [x] Replay command included in output
- [x] Evidence retrieval commands included
### WHY-004 - Determinism and Golden Tests
Status: DONE
Dependency: WHY-002, WHY-003
Owners: Developer/Implementer, QA
Task description:
Ensure command output is deterministic:
- Add golden output tests in `DeterminismReplayGoldenTests.cs`
- Verify same input produces byte-identical output
- Test all output formats (table, json, markdown)
- Verify replay token is stable across runs
Completion criteria:
- [x] Golden test fixtures for table output
- [x] Golden test fixtures for JSON output
- [x] Golden test fixtures for markdown output
- [x] Determinism hash verification test
- [x] Cross-platform normalization (CRLF -> LF)
### WHY-005 - Unit and Integration Tests
Status: DONE
Dependency: WHY-002
Owners: Developer/Implementer
Task description:
Create comprehensive test coverage:
- Unit tests for command handler with mocked backend client
- Unit tests for output rendering
- Integration test with mock API server
- Error handling tests (artifact not found, not blocked, API error)
Completion criteria:
- [x] `ExplainBlockCommandTests.cs` created
- [x] Tests for blocked artifact scenario
- [x] Tests for non-blocked artifact scenario
- [x] Tests for artifact not found scenario
- [x] Tests for all output formats
- [x] Tests for error conditions
### WHY-006 - Documentation
Status: DONE
Dependency: WHY-002, WHY-003
Owners: Documentation author
Task description:
Document the new command:
- Add to `docs/modules/cli/guides/commands/explain.md`
- Add to `docs/modules/cli/guides/commands/reference.md`
- Include examples for common scenarios
- Link from quickstart as the "why blocked?" answer
Completion criteria:
- [x] Command reference documentation
- [x] Usage examples with sample output
- [x] Linked from quickstart.md
- [x] Troubleshooting section for common issues
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | WHY-002, WHY-003 completed. ExplainCommandGroup.cs implemented with block subcommand, all output formats, evidence linking, and replay tokens. | Developer |
| 2026-01-17 | WHY-004 completed. Golden test fixtures added to DeterminismReplayGoldenTests.cs for explain block command (JSON, table, markdown formats). | QA |
| 2026-01-17 | WHY-005 completed. Comprehensive unit tests added to ExplainBlockCommandTests.cs including error handling, exit codes, edge cases. | QA |
| 2026-01-17 | WHY-006 completed. Documentation created at docs/modules/cli/guides/commands/explain.md and command reference updated. | Documentation |
| 2026-01-17 | WHY-001 completed. BlockExplanationController.cs created with GET /v1/artifacts/{digest}/block-explanation and /detailed endpoints. | Developer |
## Decisions & Risks
- **Decision needed:** Should the command be `stella explain block` or `stella why-blocked`? Recommend `stella explain block` for consistency with existing command structure.
- **Decision needed:** Should offline mode query local verdict cache or require explicit `--offline` flag?
- **Risk:** Backend API may not expose all required fields. Mitigation: WHY-001 verifies/creates endpoint first.
## Next Checkpoints
- API endpoint verified/created: +2 working days
- CLI command implementation: +3 working days
- Tests and docs: +2 working days

View File

@@ -0,0 +1,188 @@
# Sprint 026 · CLI Why-Blocked Command
## Topic & Scope
- Implement `stella explain block <digest>` command to answer "why was this artifact blocked?" with deterministic trace and evidence links.
- Addresses M2 moat requirement: "Explainability with proof, not narrative."
- Command must produce replayable, verifiable output - not just a one-time explanation.
- Working directory: `src/Cli/StellaOps.Cli/`.
- Expected evidence: CLI command with tests, golden output fixtures, documentation.
**Moat Reference:** M2 (Explainability with proof, not narrative)
**Advisory Alignment:** "'Why blocked?' must produce a deterministic trace + referenced evidence artifacts. The answer must be replayable, not a one-time explanation."
## Dependencies & Concurrency
- Depends on existing `PolicyGateDecision` and `ReasoningStatement` infrastructure (already implemented).
- Can run in parallel with Doctor expansion sprint.
- Requires backend API endpoint for gate decision retrieval (may need to add if not exposed).
## Documentation Prerequisites
- Read `src/Policy/StellaOps.Policy.Engine/Gates/PolicyGateDecision.cs` for gate decision model.
- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/Statements/ReasoningStatement.cs` for reasoning model.
- Read `src/Findings/StellaOps.Findings.Ledger.WebService/Services/EvidenceGraphBuilder.cs` for evidence linking.
- Read existing CLI command patterns in `src/Cli/StellaOps.Cli/Commands/`.
## Delivery Tracker
### WHY-001 - Backend API for Block Explanation
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Verify or create API endpoint to retrieve block explanation for an artifact:
- `GET /v1/artifacts/{digest}/block-explanation`
- Response includes: gate decision, reasoning statement, evidence links, replay token
- Must support both online (live query) and offline (cached verdict) modes
If endpoint exists, verify it returns all required fields. If not, implement it in the appropriate service (likely Findings Ledger or Policy Engine gateway).
Completion criteria:
- [x] API endpoint returns `BlockExplanationResponse` with all fields
- [x] Response includes `PolicyGateDecision` (blockedBy, reason, suggestion)
- [x] Response includes evidence artifact references (content-addressed IDs)
- [x] Response includes replay token for deterministic verification
- [x] OpenAPI spec updated
### WHY-002 - CLI Command Group Implementation
Status: DONE
Dependency: WHY-001
Owners: Developer/Implementer
Task description:
Implement `stella explain block` command in new `ExplainCommandGroup.cs`:
```
stella explain block <digest>
--format <table|json|markdown> Output format (default: table)
--show-evidence Include full evidence details
--show-trace Include policy evaluation trace
--replay-token Output replay token for verification
--output <path> Write to file instead of stdout
```
Command flow:
1. Resolve artifact by digest (support sha256:xxx format)
2. Fetch block explanation from API
3. Render gate decision with reason and suggestion
4. List evidence artifacts with content IDs
5. Provide replay token for deterministic verification
Completion criteria:
- [x] `ExplainCommandGroup.cs` created with `block` subcommand
- [x] Command registered in `CommandFactory.cs`
- [x] Table output shows: Gate, Reason, Suggestion, Evidence count
- [x] JSON output includes full response with evidence links
- [x] Markdown output suitable for issue/PR comments
- [x] Exit code 0 if artifact not blocked, 1 if blocked, 2 on error
### WHY-003 - Evidence Linking in Output
Status: DONE
Dependency: WHY-002
Owners: Developer/Implementer
Task description:
Enhance output to include actionable evidence links:
- For each evidence artifact, show: type, ID (truncated), source, timestamp
- With `--show-evidence`, show full artifact details
- Include `stella verify verdict --verdict <id>` command for replay
- Include `stella evidence get <id>` command for artifact retrieval
Output example (table format):
```
Artifact: sha256:abc123...
Status: BLOCKED
Gate: VexTrust
Reason: Trust score below threshold (0.45 < 0.70)
Suggestion: Obtain VEX statement from trusted issuer or add issuer to trust registry
Evidence:
[VEX] vex:sha256:def456... vendor-x 2026-01-15T10:00:00Z
[REACH] reach:sha256:789... static 2026-01-15T09:55:00Z
Replay: stella verify verdict --verdict urn:stella:verdict:sha256:xyz...
```
Completion criteria:
- [x] Evidence artifacts listed with type, truncated ID, source, timestamp
- [x] `--show-evidence` expands to full details
- [x] Replay command included in output
- [x] Evidence retrieval commands included
### WHY-004 - Determinism and Golden Tests
Status: DONE
Dependency: WHY-002, WHY-003
Owners: Developer/Implementer, QA
Task description:
Ensure command output is deterministic:
- Add golden output tests in `DeterminismReplayGoldenTests.cs`
- Verify same input produces byte-identical output
- Test all output formats (table, json, markdown)
- Verify replay token is stable across runs
Completion criteria:
- [x] Golden test fixtures for table output
- [x] Golden test fixtures for JSON output
- [x] Golden test fixtures for markdown output
- [x] Determinism hash verification test
- [x] Cross-platform normalization (CRLF -> LF)
### WHY-005 - Unit and Integration Tests
Status: DONE
Dependency: WHY-002
Owners: Developer/Implementer
Task description:
Create comprehensive test coverage:
- Unit tests for command handler with mocked backend client
- Unit tests for output rendering
- Integration test with mock API server
- Error handling tests (artifact not found, not blocked, API error)
Completion criteria:
- [x] `ExplainBlockCommandTests.cs` created
- [x] Tests for blocked artifact scenario
- [x] Tests for non-blocked artifact scenario
- [x] Tests for artifact not found scenario
- [x] Tests for all output formats
- [x] Tests for error conditions
### WHY-006 - Documentation
Status: DONE
Dependency: WHY-002, WHY-003
Owners: Documentation author
Task description:
Document the new command:
- Add to `docs/modules/cli/guides/commands/explain.md`
- Add to `docs/modules/cli/guides/commands/reference.md`
- Include examples for common scenarios
- Link from quickstart as the "why blocked?" answer
Completion criteria:
- [x] Command reference documentation
- [x] Usage examples with sample output
- [x] Linked from quickstart.md
- [x] Troubleshooting section for common issues
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | WHY-002, WHY-003 completed. ExplainCommandGroup.cs implemented with block subcommand, all output formats, evidence linking, and replay tokens. | Developer |
| 2026-01-17 | WHY-004 completed. Golden test fixtures added to DeterminismReplayGoldenTests.cs for explain block command (JSON, table, markdown formats). | QA |
| 2026-01-17 | WHY-005 completed. Comprehensive unit tests added to ExplainBlockCommandTests.cs including error handling, exit codes, edge cases. | QA |
| 2026-01-17 | WHY-006 completed. Documentation created at docs/modules/cli/guides/commands/explain.md and command reference updated. | Documentation |
| 2026-01-17 | WHY-001 completed. BlockExplanationController.cs created with GET /v1/artifacts/{digest}/block-explanation and /detailed endpoints. | Developer |
## Decisions & Risks
- **Decision needed:** Should the command be `stella explain block` or `stella why-blocked`? Recommend `stella explain block` for consistency with existing command structure.
- **Decision needed:** Should offline mode query local verdict cache or require explicit `--offline` flag?
- **Risk:** Backend API may not expose all required fields. Mitigation: WHY-001 verifies/creates endpoint first.
## Next Checkpoints
- API endpoint verified/created: +2 working days
- CLI command implementation: +3 working days
- Tests and docs: +2 working days

View File

@@ -0,0 +1,280 @@
# Sprint 027 · CLI Audit Bundle Command
## Topic & Scope
- Implement `stella audit bundle` command to produce self-contained, auditor-ready evidence packages.
- Addresses M1 moat requirement: "Evidence chain continuity - no glue work required."
- Bundle must contain everything an auditor needs without requiring additional tool invocations.
- Working directory: `src/Cli/StellaOps.Cli/`.
- Expected evidence: CLI command, bundle format spec, tests, documentation.
**Moat Reference:** M1 (Evidence chain continuity - no glue work required)
**Advisory Alignment:** "Do not require customers to stitch multiple tools together to get audit-grade releases." and "Audit export acceptance rate (auditors can consume without manual reconstruction)."
## Dependencies & Concurrency
- Depends on existing export infrastructure (`DeterministicExportUtilities.cs`, `ExportEngine`).
- Can leverage `stella attest bundle` and `stella export run` as foundation.
- Can run in parallel with other CLI sprints.
## Documentation Prerequisites
- Read `src/Cli/StellaOps.Cli/Export/DeterministicExportUtilities.cs` for export patterns.
- Read `src/Excititor/__Libraries/StellaOps.Excititor.Export/ExportEngine.cs` for existing export logic.
- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/` for attestation structures.
- Review common audit requirements (SOC2, ISO27001, FedRAMP) for bundle contents.
## Delivery Tracker
### AUD-001 - Audit Bundle Format Specification
Status: DONE
Dependency: none
Owners: Product Manager, Developer/Implementer
Task description:
Define the audit bundle format specification:
```
audit-bundle-<digest>-<timestamp>/
manifest.json # Bundle manifest with hashes
README.md # Human-readable guide for auditors
verdict/
verdict.json # StellaVerdict artifact
verdict.dsse.json # DSSE envelope with signatures
evidence/
sbom.json # SBOM (CycloneDX or SPDX)
vex-statements/ # All VEX statements considered
*.json
reachability/
analysis.json # Reachability analysis result
call-graph.dot # Call graph visualization (optional)
provenance/
slsa-provenance.json
policy/
policy-snapshot.json # Policy version used
gate-decision.json # Gate evaluation result
evaluation-trace.json # Full policy trace
replay/
knowledge-snapshot.json # Frozen inputs for replay
replay-instructions.md # How to replay verdict
schema/
verdict-schema.json # Schema references
vex-schema.json
```
Completion criteria:
- [x] Bundle format documented in `docs/modules/cli/guides/audit-bundle-format.md`
- [x] Manifest schema defined with file hashes
- [x] README.md template created for auditor guidance
- [x] Format reviewed against SOC2/ISO27001 common requirements
### AUD-002 - Bundle Generation Service
Status: DONE
Dependency: AUD-001
Owners: Developer/Implementer
Task description:
Implement `AuditBundleService` in CLI services:
- Collect all artifacts for a given digest
- Generate deterministic bundle structure
- Compute manifest with file hashes
- Support archive formats: directory, tar.gz, zip
```csharp
public interface IAuditBundleService
{
Task<AuditBundleResult> GenerateBundleAsync(
string artifactDigest,
AuditBundleOptions options,
CancellationToken cancellationToken);
}
public record AuditBundleOptions(
string OutputPath,
AuditBundleFormat Format, // Directory, TarGz, Zip
bool IncludeCallGraph,
bool IncludeSchemas,
string? PolicyVersion);
```
Completion criteria:
- [x] `AuditBundleService.cs` created
- [x] All evidence artifacts collected and organized
- [x] Manifest generated with SHA-256 hashes
- [x] README.md generated from template
- [x] Directory output format working
- [x] tar.gz output format working
- [x] zip output format working
### AUD-003 - CLI Command Implementation
Status: DONE
Dependency: AUD-002
Owners: Developer/Implementer
Task description:
Implement `stella audit bundle` command:
```
stella audit bundle <digest>
--output <path> Output path (default: ./audit-bundle-<digest>/)
--format <dir|tar.gz|zip> Output format (default: dir)
--include-call-graph Include call graph visualization
--include-schemas Include JSON schema files
--policy-version <ver> Use specific policy version
--verbose Show progress during generation
```
Command flow:
1. Resolve artifact by digest
2. Fetch verdict and all linked evidence
3. Generate bundle using `AuditBundleService`
4. Verify bundle integrity (hash check)
5. Output summary with file count and total size
Completion criteria:
- [x] `AuditCommandGroup.cs` updated with `bundle` subcommand
- [x] Command registered in `CommandFactory.cs`
- [x] All options implemented
- [x] Progress reporting for large bundles
- [x] Exit code 0 on success, 1 on missing evidence, 2 on error
### AUD-004 - Replay Instructions Generation
Status: DONE
Dependency: AUD-002
Owners: Developer/Implementer
Task description:
Generate `replay/replay-instructions.md` with:
- Prerequisites (Stella CLI version, network requirements)
- Step-by-step replay commands
- Expected output verification
- Troubleshooting for common replay failures
Template should be parameterized with actual values from the bundle.
Example content:
```markdown
# Replay Instructions
## Prerequisites
- Stella CLI v2.5.0 or later
- Network access to policy engine (or offline mode with bundled policy)
## Steps
1. Verify bundle integrity:
```
stella audit verify ./audit-bundle-sha256-abc123/
```
2. Replay verdict:
```
stella replay snapshot \
--manifest ./audit-bundle-sha256-abc123/replay/knowledge-snapshot.json \
--output ./replay-result.json
```
3. Compare results:
```
stella replay diff \
./audit-bundle-sha256-abc123/verdict/verdict.json \
./replay-result.json
```
## Expected Result
Verdict digest should match: sha256:abc123...
```
Completion criteria:
- [x] `ReplayInstructionsGenerator.cs` created (inline in AuditCommandGroup)
- [x] Template with parameterized values
- [x] All CLI commands in instructions are valid
- [x] Troubleshooting section included
### AUD-005 - Bundle Verification Command
Status: DONE
Dependency: AUD-003
Owners: Developer/Implementer
Task description:
Implement `stella audit verify` to validate bundle integrity:
```
stella audit verify <bundle-path>
--strict Fail on any missing optional files
--check-signatures Verify DSSE signatures
--trusted-keys <path> Trusted keys for signature verification
```
Verification steps:
1. Parse manifest.json
2. Verify all file hashes match
3. Validate verdict content ID
4. Optionally verify signatures
5. Report any integrity issues
Completion criteria:
- [x] `audit verify` subcommand implemented
- [x] Manifest hash verification
- [x] Verdict content ID verification
- [x] Signature verification (optional)
- [x] Clear error messages for integrity failures
- [x] Exit code 0 on valid, 1 on invalid, 2 on error
### AUD-006 - Tests
Status: DONE
Dependency: AUD-003, AUD-005
Owners: Developer/Implementer, QA
Task description:
Create comprehensive test coverage:
- Unit tests for `AuditBundleService`
- Unit tests for command handlers
- Integration test generating real bundle
- Golden tests for README.md and replay-instructions.md
- Verification tests for all output formats
Completion criteria:
- [x] `AuditBundleServiceTests.cs` created
- [x] `AuditBundleCommandTests.cs` created (combined with service tests)
- [x] `AuditVerifyCommandTests.cs` created
- [x] Integration test with synthetic evidence
- [x] Golden output tests for generated markdown
- [x] Tests for all archive formats
### AUD-007 - Documentation
Status: DONE
Dependency: AUD-003, AUD-004, AUD-005
Owners: Documentation author
Task description:
Document the audit bundle feature:
- Command reference in `docs/modules/cli/guides/commands/audit.md`
- Bundle format specification in `docs/modules/cli/guides/audit-bundle-format.md`
- Auditor guide in `docs/operations/guides/auditor-guide.md`
- Add to command reference index
Completion criteria:
- [x] Command reference documentation
- [x] Bundle format specification
- [x] Auditor-facing guide with screenshots/examples
- [x] Linked from FEATURE_MATRIX.md
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | AUD-003, AUD-004 completed. audit bundle command implemented in AuditCommandGroup.cs with all output formats, manifest generation, README, and replay instructions. | Developer |
| 2026-01-17 | AUD-001, AUD-002, AUD-005, AUD-006, AUD-007 completed. Bundle format spec documented, IAuditBundleService + AuditBundleService implemented, AuditVerifyCommand implemented, tests added. | Developer |
| 2026-01-17 | AUD-007 documentation completed. Command reference (audit.md), auditor guide created. | Documentation |
| 2026-01-17 | Final verification: AuditVerifyCommandTests.cs created with archive format tests and golden output tests. All tasks DONE. Sprint ready for archive. | QA |
## Decisions & Risks
- **Decision needed:** Should bundle include raw VEX documents or normalized versions? Recommend: both (raw in `vex-statements/raw/`, normalized in `vex-statements/normalized/`).
- **Decision needed:** What archive format should be default? Recommend: directory for local use, tar.gz for transfer.
- **Risk:** Large bundles may be slow to generate. Mitigation: Add progress reporting and consider streaming archive creation.
- **Risk:** Bundle format may need evolution. Mitigation: Include schema version in manifest from day one.
## Next Checkpoints
- Format specification complete: +2 working days
- Bundle generation working: +4 working days
- Commands and tests complete: +3 working days
- Documentation complete: +2 working days

View File

@@ -0,0 +1,280 @@
# Sprint 027 · CLI Audit Bundle Command
## Topic & Scope
- Implement `stella audit bundle` command to produce self-contained, auditor-ready evidence packages.
- Addresses M1 moat requirement: "Evidence chain continuity - no glue work required."
- Bundle must contain everything an auditor needs without requiring additional tool invocations.
- Working directory: `src/Cli/StellaOps.Cli/`.
- Expected evidence: CLI command, bundle format spec, tests, documentation.
**Moat Reference:** M1 (Evidence chain continuity - no glue work required)
**Advisory Alignment:** "Do not require customers to stitch multiple tools together to get audit-grade releases." and "Audit export acceptance rate (auditors can consume without manual reconstruction)."
## Dependencies & Concurrency
- Depends on existing export infrastructure (`DeterministicExportUtilities.cs`, `ExportEngine`).
- Can leverage `stella attest bundle` and `stella export run` as foundation.
- Can run in parallel with other CLI sprints.
## Documentation Prerequisites
- Read `src/Cli/StellaOps.Cli/Export/DeterministicExportUtilities.cs` for export patterns.
- Read `src/Excititor/__Libraries/StellaOps.Excititor.Export/ExportEngine.cs` for existing export logic.
- Read `src/Attestor/__Libraries/StellaOps.Attestor.ProofChain/` for attestation structures.
- Review common audit requirements (SOC2, ISO27001, FedRAMP) for bundle contents.
## Delivery Tracker
### AUD-001 - Audit Bundle Format Specification
Status: DONE
Dependency: none
Owners: Product Manager, Developer/Implementer
Task description:
Define the audit bundle format specification:
```
audit-bundle-<digest>-<timestamp>/
manifest.json # Bundle manifest with hashes
README.md # Human-readable guide for auditors
verdict/
verdict.json # StellaVerdict artifact
verdict.dsse.json # DSSE envelope with signatures
evidence/
sbom.json # SBOM (CycloneDX or SPDX)
vex-statements/ # All VEX statements considered
*.json
reachability/
analysis.json # Reachability analysis result
call-graph.dot # Call graph visualization (optional)
provenance/
slsa-provenance.json
policy/
policy-snapshot.json # Policy version used
gate-decision.json # Gate evaluation result
evaluation-trace.json # Full policy trace
replay/
knowledge-snapshot.json # Frozen inputs for replay
replay-instructions.md # How to replay verdict
schema/
verdict-schema.json # Schema references
vex-schema.json
```
Completion criteria:
- [x] Bundle format documented in `docs/modules/cli/guides/audit-bundle-format.md`
- [x] Manifest schema defined with file hashes
- [x] README.md template created for auditor guidance
- [x] Format reviewed against SOC2/ISO27001 common requirements
### AUD-002 - Bundle Generation Service
Status: DONE
Dependency: AUD-001
Owners: Developer/Implementer
Task description:
Implement `AuditBundleService` in CLI services:
- Collect all artifacts for a given digest
- Generate deterministic bundle structure
- Compute manifest with file hashes
- Support archive formats: directory, tar.gz, zip
```csharp
public interface IAuditBundleService
{
Task<AuditBundleResult> GenerateBundleAsync(
string artifactDigest,
AuditBundleOptions options,
CancellationToken cancellationToken);
}
public record AuditBundleOptions(
string OutputPath,
AuditBundleFormat Format, // Directory, TarGz, Zip
bool IncludeCallGraph,
bool IncludeSchemas,
string? PolicyVersion);
```
Completion criteria:
- [x] `AuditBundleService.cs` created
- [x] All evidence artifacts collected and organized
- [x] Manifest generated with SHA-256 hashes
- [x] README.md generated from template
- [x] Directory output format working
- [x] tar.gz output format working
- [x] zip output format working
### AUD-003 - CLI Command Implementation
Status: DONE
Dependency: AUD-002
Owners: Developer/Implementer
Task description:
Implement `stella audit bundle` command:
```
stella audit bundle <digest>
--output <path> Output path (default: ./audit-bundle-<digest>/)
--format <dir|tar.gz|zip> Output format (default: dir)
--include-call-graph Include call graph visualization
--include-schemas Include JSON schema files
--policy-version <ver> Use specific policy version
--verbose Show progress during generation
```
Command flow:
1. Resolve artifact by digest
2. Fetch verdict and all linked evidence
3. Generate bundle using `AuditBundleService`
4. Verify bundle integrity (hash check)
5. Output summary with file count and total size
Completion criteria:
- [x] `AuditCommandGroup.cs` updated with `bundle` subcommand
- [x] Command registered in `CommandFactory.cs`
- [x] All options implemented
- [x] Progress reporting for large bundles
- [x] Exit code 0 on success, 1 on missing evidence, 2 on error
### AUD-004 - Replay Instructions Generation
Status: DONE
Dependency: AUD-002
Owners: Developer/Implementer
Task description:
Generate `replay/replay-instructions.md` with:
- Prerequisites (Stella CLI version, network requirements)
- Step-by-step replay commands
- Expected output verification
- Troubleshooting for common replay failures
Template should be parameterized with actual values from the bundle.
Example content:
```markdown
# Replay Instructions
## Prerequisites
- Stella CLI v2.5.0 or later
- Network access to policy engine (or offline mode with bundled policy)
## Steps
1. Verify bundle integrity:
```
stella audit verify ./audit-bundle-sha256-abc123/
```
2. Replay verdict:
```
stella replay snapshot \
--manifest ./audit-bundle-sha256-abc123/replay/knowledge-snapshot.json \
--output ./replay-result.json
```
3. Compare results:
```
stella replay diff \
./audit-bundle-sha256-abc123/verdict/verdict.json \
./replay-result.json
```
## Expected Result
Verdict digest should match: sha256:abc123...
```
Completion criteria:
- [x] `ReplayInstructionsGenerator.cs` created (inline in AuditCommandGroup)
- [x] Template with parameterized values
- [x] All CLI commands in instructions are valid
- [x] Troubleshooting section included
### AUD-005 - Bundle Verification Command
Status: DONE
Dependency: AUD-003
Owners: Developer/Implementer
Task description:
Implement `stella audit verify` to validate bundle integrity:
```
stella audit verify <bundle-path>
--strict Fail on any missing optional files
--check-signatures Verify DSSE signatures
--trusted-keys <path> Trusted keys for signature verification
```
Verification steps:
1. Parse manifest.json
2. Verify all file hashes match
3. Validate verdict content ID
4. Optionally verify signatures
5. Report any integrity issues
Completion criteria:
- [x] `audit verify` subcommand implemented
- [x] Manifest hash verification
- [x] Verdict content ID verification
- [x] Signature verification (optional)
- [x] Clear error messages for integrity failures
- [x] Exit code 0 on valid, 1 on invalid, 2 on error
### AUD-006 - Tests
Status: DONE
Dependency: AUD-003, AUD-005
Owners: Developer/Implementer, QA
Task description:
Create comprehensive test coverage:
- Unit tests for `AuditBundleService`
- Unit tests for command handlers
- Integration test generating real bundle
- Golden tests for README.md and replay-instructions.md
- Verification tests for all output formats
Completion criteria:
- [x] `AuditBundleServiceTests.cs` created
- [x] `AuditBundleCommandTests.cs` created (combined with service tests)
- [x] `AuditVerifyCommandTests.cs` created
- [x] Integration test with synthetic evidence
- [x] Golden output tests for generated markdown
- [x] Tests for all archive formats
### AUD-007 - Documentation
Status: DONE
Dependency: AUD-003, AUD-004, AUD-005
Owners: Documentation author
Task description:
Document the audit bundle feature:
- Command reference in `docs/modules/cli/guides/commands/audit.md`
- Bundle format specification in `docs/modules/cli/guides/audit-bundle-format.md`
- Auditor guide in `docs/operations/guides/auditor-guide.md`
- Add to command reference index
Completion criteria:
- [x] Command reference documentation
- [x] Bundle format specification
- [x] Auditor-facing guide with screenshots/examples
- [x] Linked from FEATURE_MATRIX.md
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | AUD-003, AUD-004 completed. audit bundle command implemented in AuditCommandGroup.cs with all output formats, manifest generation, README, and replay instructions. | Developer |
| 2026-01-17 | AUD-001, AUD-002, AUD-005, AUD-006, AUD-007 completed. Bundle format spec documented, IAuditBundleService + AuditBundleService implemented, AuditVerifyCommand implemented, tests added. | Developer |
| 2026-01-17 | AUD-007 documentation completed. Command reference (audit.md), auditor guide created. | Documentation |
| 2026-01-17 | Final verification: AuditVerifyCommandTests.cs created with archive format tests and golden output tests. All tasks DONE. Sprint ready for archive. | QA |
## Decisions & Risks
- **Decision needed:** Should bundle include raw VEX documents or normalized versions? Recommend: both (raw in `vex-statements/raw/`, normalized in `vex-statements/normalized/`).
- **Decision needed:** What archive format should be default? Recommend: directory for local use, tar.gz for transfer.
- **Risk:** Large bundles may be slow to generate. Mitigation: Add progress reporting and consider streaming archive creation.
- **Risk:** Bundle format may need evolution. Mitigation: Include schema version in manifest from day one.
## Next Checkpoints
- Format specification complete: +2 working days
- Bundle generation working: +4 working days
- Commands and tests complete: +3 working days
- Documentation complete: +2 working days

View File

@@ -0,0 +1,240 @@
# Sprint 028 · P0 Product Metrics Definition
## Topic & Scope
- Define and instrument the four P0 product-level metrics from the AI Economics Moat advisory.
- Create Grafana dashboard templates for tracking these metrics.
- Enable solo-scaled operations by making product health visible at a glance.
- Working directory: `src/Telemetry/`, `devops/telemetry/`.
- Expected evidence: Metric definitions, instrumentation, dashboard templates, alerting rules.
**Moat Reference:** M3 (Operability moat), Section 8 (Product-level metrics)
**Advisory Alignment:** "These metrics are the scoreboard. Prioritize work that improves them."
## Dependencies & Concurrency
- Requires existing OpenTelemetry infrastructure (already in place).
- Can run in parallel with other sprints.
- Dashboard templates depend on Grafana/Prometheus stack.
## Documentation Prerequisites
- Read `docs/modules/telemetry/guides/observability.md` for existing metric patterns.
- Read `src/Attestor/StellaOps.Attestor/StellaOps.Attestor.Core/Verification/RekorVerificationMetrics.cs` for metric implementation patterns.
- Read advisory section 8 for metric definitions.
## Delivery Tracker
### P0M-001 - Time-to-First-Verified-Release Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_time_to_first_verified_release_seconds` histogram:
**Definition:** Elapsed time from fresh install (first service startup) to first successful verified promotion (policy gate passed, evidence recorded).
**Labels:**
- `tenant`: Tenant identifier
- `deployment_type`: `fresh` | `upgrade`
**Collection points:**
1. Record install timestamp on first Authority startup (store in DB)
2. Record first verified promotion timestamp in Release Orchestrator
3. Emit metric on first promotion with duration = promotion_time - install_time
**Implementation:**
- Add `InstallTimestampService` to record first startup
- Add metric emission in `ReleaseOrchestrator` on first promotion per tenant
- Use histogram buckets: 5m, 15m, 30m, 1h, 2h, 4h, 8h, 24h, 48h, 168h (1 week)
Completion criteria:
- [x] Install timestamp recorded on first startup
- [x] Metric emitted on first verified promotion
- [x] Histogram with appropriate buckets
- [x] Label for tenant and deployment type
- [x] Unit test for metric emission
### P0M-002 - Mean Time to Answer "Why Blocked" Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_why_blocked_latency_seconds` histogram:
**Definition:** Time from block decision to user viewing explanation (via CLI, UI, or API).
**Labels:**
- `tenant`: Tenant identifier
- `surface`: `cli` | `ui` | `api`
- `resolution_type`: `immediate` (same session) | `delayed` (different session)
**Collection points:**
1. Record block decision timestamp in verdict
2. Record explanation view timestamp when `stella explain block` or UI equivalent is invoked
3. Emit metric with duration
**Implementation:**
- Add explanation view tracking in CLI command
- Add explanation view tracking in UI (existing telemetry hook)
- Correlate via artifact digest
- Use histogram buckets: 1s, 5s, 30s, 1m, 5m, 15m, 1h, 4h, 24h
Completion criteria:
- [x] Block decision timestamp available in verdict
- [x] Explanation view events tracked
- [x] Correlation by artifact digest
- [x] Histogram with appropriate buckets
- [x] Surface label populated correctly
### P0M-003 - Support Minutes per Customer Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_support_burden_minutes_total` counter:
**Definition:** Accumulated support time per customer per month. This is a manual/semi-automated metric for solo operations tracking.
**Labels:**
- `tenant`: Tenant identifier
- `category`: `install` | `config` | `policy` | `integration` | `bug` | `other`
- `month`: YYYY-MM
**Collection approach:**
Since this is primarily manual, create:
1. CLI command `stella ops support log --tenant <id> --minutes <n> --category <cat>` for logging support events
2. API endpoint for programmatic logging
3. Counter incremented on each log entry
**Target:** Trend toward zero. Alert if any tenant exceeds 30 minutes/month.
Completion criteria:
- [x] Metric definition in P0ProductMetrics.cs
- [x] Counter metric with labels
- [x] Monthly aggregation capability
- [x] Dashboard panel showing trend
### P0M-004 - Determinism Regressions Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_determinism_regressions_total` counter:
**Definition:** Count of detected determinism failures in production (same inputs produced different outputs).
**Labels:**
- `tenant`: Tenant identifier
- `component`: `scanner` | `policy` | `attestor` | `export`
- `severity`: `bitwise` | `semantic` | `policy` (matches fidelity tiers)
**Collection points:**
1. Determinism verification jobs (scheduled)
2. Replay verification failures
3. Golden test CI failures (development)
**Implementation:**
- Add counter emission in `DeterminismVerifier`
- Add counter emission in replay batch jobs
- Use existing fidelity tier classification
**Target:** Near-zero. Alert immediately on any `policy` severity regression.
Completion criteria:
- [x] Counter metric with labels
- [x] Emission on determinism verification failure
- [x] Severity classification (bitwise/semantic/policy)
- [x] Unit test for metric emission
### P0M-005 - Grafana Dashboard Template
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004
Owners: Developer/Implementer
Task description:
Create Grafana dashboard template `stella-ops-p0-metrics.json`:
**Panels:**
1. **Time to First Release** - Histogram heatmap + P50/P90/P99 stat
2. **Why Blocked Latency** - Histogram heatmap + trend line
3. **Support Burden** - Stacked bar by category, monthly trend
4. **Determinism Regressions** - Counter with severity breakdown, alert status
**Features:**
- Tenant selector variable
- Time range selector
- Drill-down links to detailed dashboards
- SLO indicator (green/yellow/red)
**File location:** `devops/telemetry/grafana/dashboards/stella-ops-p0-metrics.json`
Completion criteria:
- [x] Dashboard JSON template created
- [x] All four P0 metrics visualized
- [x] Tenant filtering working
- [x] SLO indicators configured
- [x] Unit test for dashboard schema
### P0M-006 - Alerting Rules
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004
Owners: Developer/Implementer
Task description:
Create Prometheus alerting rules for P0 metrics:
**Rules:**
1. `StellaTimeToFirstReleaseHigh` - P90 > 4 hours (warning), P90 > 24 hours (critical)
2. `StellaWhyBlockedLatencyHigh` - P90 > 5 minutes (warning), P90 > 1 hour (critical)
3. `StellaSupportBurdenHigh` - Any tenant > 30 min/month (warning), > 60 min/month (critical)
4. `StellaDeterminismRegression` - Any policy-level regression (critical immediately)
**File location:** `devops/telemetry/alerts/stella-p0-alerts.yml`
Completion criteria:
- [x] Alert rules file created
- [x] All four metrics have alert rules
- [x] Severity levels appropriate
- [x] Alert annotations include runbook links
- [x] Tested with synthetic data
### P0M-007 - Documentation
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004, P0M-005, P0M-006
Owners: Documentation author
Task description:
Document the P0 metrics:
- Add metrics to `docs/modules/telemetry/guides/p0-metrics.md`
- Include metric definitions, labels, collection points
- Include dashboard screenshot and usage guide
- Include alerting thresholds and response procedures
- Link from advisory and FEATURE_MATRIX.md
Completion criteria:
- [x] Metric definitions documented
- [x] Dashboard usage guide
- [x] Alert response procedures
- [x] Linked from advisory implementation tracking
- [x] Linked from FEATURE_MATRIX.md
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | P0M-001 through P0M-006 completed. P0ProductMetrics.cs, InstallTimestampService.cs, Grafana dashboard, and alert rules implemented. Tests added. | Developer |
| 2026-01-17 | P0M-007 completed. docs/modules/telemetry/guides/p0-metrics.md created with full metric documentation, dashboard guide, and alert procedures. | Documentation |
## Decisions & Risks
- **Decision needed:** For P0M-003 (support burden), should we integrate with external ticketing systems (Jira, Linear) or keep it CLI-only? Recommend: CLI-only initially, add integrations later.
- **Decision needed:** What histogram bucket distributions are appropriate? Recommend: Start with proposed buckets, refine based on real data.
- **Risk:** Time-to-first-release metric requires install timestamp persistence. If DB is wiped, metric resets. Mitigation: Accept this limitation; document in metric description.
- **Risk:** Why-blocked correlation may be imperfect if user investigates via different surface than where block occurred. Mitigation: Track best-effort, note limitation in docs.
## Next Checkpoints
- Metric instrumentation complete: +3 working days
- Dashboard template complete: +2 working days
- Alerting rules and docs: +2 working days

View File

@@ -0,0 +1,240 @@
# Sprint 028 · P0 Product Metrics Definition
## Topic & Scope
- Define and instrument the four P0 product-level metrics from the AI Economics Moat advisory.
- Create Grafana dashboard templates for tracking these metrics.
- Enable solo-scaled operations by making product health visible at a glance.
- Working directory: `src/Telemetry/`, `devops/telemetry/`.
- Expected evidence: Metric definitions, instrumentation, dashboard templates, alerting rules.
**Moat Reference:** M3 (Operability moat), Section 8 (Product-level metrics)
**Advisory Alignment:** "These metrics are the scoreboard. Prioritize work that improves them."
## Dependencies & Concurrency
- Requires existing OpenTelemetry infrastructure (already in place).
- Can run in parallel with other sprints.
- Dashboard templates depend on Grafana/Prometheus stack.
## Documentation Prerequisites
- Read `docs/modules/telemetry/guides/observability.md` for existing metric patterns.
- Read `src/Attestor/StellaOps.Attestor/StellaOps.Attestor.Core/Verification/RekorVerificationMetrics.cs` for metric implementation patterns.
- Read advisory section 8 for metric definitions.
## Delivery Tracker
### P0M-001 - Time-to-First-Verified-Release Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_time_to_first_verified_release_seconds` histogram:
**Definition:** Elapsed time from fresh install (first service startup) to first successful verified promotion (policy gate passed, evidence recorded).
**Labels:**
- `tenant`: Tenant identifier
- `deployment_type`: `fresh` | `upgrade`
**Collection points:**
1. Record install timestamp on first Authority startup (store in DB)
2. Record first verified promotion timestamp in Release Orchestrator
3. Emit metric on first promotion with duration = promotion_time - install_time
**Implementation:**
- Add `InstallTimestampService` to record first startup
- Add metric emission in `ReleaseOrchestrator` on first promotion per tenant
- Use histogram buckets: 5m, 15m, 30m, 1h, 2h, 4h, 8h, 24h, 48h, 168h (1 week)
Completion criteria:
- [x] Install timestamp recorded on first startup
- [x] Metric emitted on first verified promotion
- [x] Histogram with appropriate buckets
- [x] Label for tenant and deployment type
- [x] Unit test for metric emission
### P0M-002 - Mean Time to Answer "Why Blocked" Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_why_blocked_latency_seconds` histogram:
**Definition:** Time from block decision to user viewing explanation (via CLI, UI, or API).
**Labels:**
- `tenant`: Tenant identifier
- `surface`: `cli` | `ui` | `api`
- `resolution_type`: `immediate` (same session) | `delayed` (different session)
**Collection points:**
1. Record block decision timestamp in verdict
2. Record explanation view timestamp when `stella explain block` or UI equivalent is invoked
3. Emit metric with duration
**Implementation:**
- Add explanation view tracking in CLI command
- Add explanation view tracking in UI (existing telemetry hook)
- Correlate via artifact digest
- Use histogram buckets: 1s, 5s, 30s, 1m, 5m, 15m, 1h, 4h, 24h
Completion criteria:
- [x] Block decision timestamp available in verdict
- [x] Explanation view events tracked
- [x] Correlation by artifact digest
- [x] Histogram with appropriate buckets
- [x] Surface label populated correctly
### P0M-003 - Support Minutes per Customer Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_support_burden_minutes_total` counter:
**Definition:** Accumulated support time per customer per month. This is a manual/semi-automated metric for solo operations tracking.
**Labels:**
- `tenant`: Tenant identifier
- `category`: `install` | `config` | `policy` | `integration` | `bug` | `other`
- `month`: YYYY-MM
**Collection approach:**
Since this is primarily manual, create:
1. CLI command `stella ops support log --tenant <id> --minutes <n> --category <cat>` for logging support events
2. API endpoint for programmatic logging
3. Counter incremented on each log entry
**Target:** Trend toward zero. Alert if any tenant exceeds 30 minutes/month.
Completion criteria:
- [x] Metric definition in P0ProductMetrics.cs
- [x] Counter metric with labels
- [x] Monthly aggregation capability
- [x] Dashboard panel showing trend
### P0M-004 - Determinism Regressions Metric
Status: DONE
Dependency: none
Owners: Developer/Implementer
Task description:
Instrument `stella_determinism_regressions_total` counter:
**Definition:** Count of detected determinism failures in production (same inputs produced different outputs).
**Labels:**
- `tenant`: Tenant identifier
- `component`: `scanner` | `policy` | `attestor` | `export`
- `severity`: `bitwise` | `semantic` | `policy` (matches fidelity tiers)
**Collection points:**
1. Determinism verification jobs (scheduled)
2. Replay verification failures
3. Golden test CI failures (development)
**Implementation:**
- Add counter emission in `DeterminismVerifier`
- Add counter emission in replay batch jobs
- Use existing fidelity tier classification
**Target:** Near-zero. Alert immediately on any `policy` severity regression.
Completion criteria:
- [x] Counter metric with labels
- [x] Emission on determinism verification failure
- [x] Severity classification (bitwise/semantic/policy)
- [x] Unit test for metric emission
### P0M-005 - Grafana Dashboard Template
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004
Owners: Developer/Implementer
Task description:
Create Grafana dashboard template `stella-ops-p0-metrics.json`:
**Panels:**
1. **Time to First Release** - Histogram heatmap + P50/P90/P99 stat
2. **Why Blocked Latency** - Histogram heatmap + trend line
3. **Support Burden** - Stacked bar by category, monthly trend
4. **Determinism Regressions** - Counter with severity breakdown, alert status
**Features:**
- Tenant selector variable
- Time range selector
- Drill-down links to detailed dashboards
- SLO indicator (green/yellow/red)
**File location:** `devops/telemetry/grafana/dashboards/stella-ops-p0-metrics.json`
Completion criteria:
- [x] Dashboard JSON template created
- [x] All four P0 metrics visualized
- [x] Tenant filtering working
- [x] SLO indicators configured
- [x] Unit test for dashboard schema
### P0M-006 - Alerting Rules
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004
Owners: Developer/Implementer
Task description:
Create Prometheus alerting rules for P0 metrics:
**Rules:**
1. `StellaTimeToFirstReleaseHigh` - P90 > 4 hours (warning), P90 > 24 hours (critical)
2. `StellaWhyBlockedLatencyHigh` - P90 > 5 minutes (warning), P90 > 1 hour (critical)
3. `StellaSupportBurdenHigh` - Any tenant > 30 min/month (warning), > 60 min/month (critical)
4. `StellaDeterminismRegression` - Any policy-level regression (critical immediately)
**File location:** `devops/telemetry/alerts/stella-p0-alerts.yml`
Completion criteria:
- [x] Alert rules file created
- [x] All four metrics have alert rules
- [x] Severity levels appropriate
- [x] Alert annotations include runbook links
- [x] Tested with synthetic data
### P0M-007 - Documentation
Status: DONE
Dependency: P0M-001, P0M-002, P0M-003, P0M-004, P0M-005, P0M-006
Owners: Documentation author
Task description:
Document the P0 metrics:
- Add metrics to `docs/modules/telemetry/guides/p0-metrics.md`
- Include metric definitions, labels, collection points
- Include dashboard screenshot and usage guide
- Include alerting thresholds and response procedures
- Link from advisory and FEATURE_MATRIX.md
Completion criteria:
- [x] Metric definitions documented
- [x] Dashboard usage guide
- [x] Alert response procedures
- [x] Linked from advisory implementation tracking
- [x] Linked from FEATURE_MATRIX.md
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | P0M-001 through P0M-006 completed. P0ProductMetrics.cs, InstallTimestampService.cs, Grafana dashboard, and alert rules implemented. Tests added. | Developer |
| 2026-01-17 | P0M-007 completed. docs/modules/telemetry/guides/p0-metrics.md created with full metric documentation, dashboard guide, and alert procedures. | Documentation |
## Decisions & Risks
- **Decision needed:** For P0M-003 (support burden), should we integrate with external ticketing systems (Jira, Linear) or keep it CLI-only? Recommend: CLI-only initially, add integrations later.
- **Decision needed:** What histogram bucket distributions are appropriate? Recommend: Start with proposed buckets, refine based on real data.
- **Risk:** Time-to-first-release metric requires install timestamp persistence. If DB is wiped, metric resets. Mitigation: Accept this limitation; document in metric description.
- **Risk:** Why-blocked correlation may be imperfect if user investigates via different surface than where block occurred. Mitigation: Track best-effort, note limitation in docs.
## Next Checkpoints
- Metric instrumentation complete: +3 working days
- Dashboard template complete: +2 working days
- Alerting rules and docs: +2 working days

View File

@@ -0,0 +1,353 @@
# Sprint 029 · Runbook Coverage Expansion
## Topic & Scope
- Expand operational runbook coverage to support solo-scaled operations.
- Create runbook template and establish coverage requirements per module.
- Ensure every critical failure mode has documented diagnosis and recovery steps.
- Working directory: `docs/operations/runbooks/`.
- Expected evidence: Runbook template, module runbooks, coverage tracking.
**Moat Reference:** M3 (Operability moat - Doctor + safe defaults)
**Advisory Alignment:** "Every integration must ship with health checks and failure-mode docs." and "Runtime failures have deterministic recovery playbooks."
## Dependencies & Concurrency
- No code dependencies; documentation-only sprint.
- Can run fully in parallel with other sprints.
- Should coordinate with Doctor expansion sprint for consistency.
## Documentation Prerequisites
- Read existing runbooks: `docs/operations/runbooks/vuln-ops.md`, `vex-ops.md`, `policy-incident.md`
- Read Doctor check implementations for failure modes
- Read `docs/modules/concelier/operations/connectors/` for connector patterns
## Delivery Tracker
### RUN-001 - Runbook Template
Status: DONE
Dependency: none
Owners: Documentation author
Task description:
Create standardized runbook template at `docs/operations/runbooks/_template.md`:
```markdown
# Runbook: [Component] - [Failure Scenario]
## Metadata
- **Component:** [Module name]
- **Severity:** Critical | High | Medium | Low
- **On-call scope:** [Who should be paged]
- **Last updated:** [Date]
- **Doctor check:** [Check ID if applicable]
## Symptoms
- [Observable symptom 1]
- [Observable symptom 2]
- [Metric/alert that fires]
## Impact
- [User-facing impact]
- [Data integrity impact]
- [SLA impact]
## Diagnosis
### Quick checks
1. [First thing to check]
```bash
stella doctor --check [check-id]
```
2. [Second thing to check]
### Deep diagnosis
[More detailed investigation steps]
## Resolution
### Immediate mitigation
[Steps to restore service quickly, even if not root cause fix]
### Root cause fix
[Steps to fix the underlying issue]
### Verification
[How to confirm the fix worked]
## Prevention
- [How to prevent recurrence]
- [Monitoring to add]
## Related
- [Link to architecture doc]
- [Link to related runbooks]
- [Link to Doctor check source]
```
Completion criteria:
- [x] Template file created
- [x] All sections documented with guidance
- [x] Example runbook using template
- [x] Template reviewed by ops stakeholder
### RUN-001A - PostgreSQL Runbook (NEW)
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create comprehensive PostgreSQL operations runbook covering:
- Daily health checks
- Connection pool tuning
- Backup and restore
- Migration execution
- Incident procedures (pool exhaustion, slow queries, connectivity loss, disk space)
Completion criteria:
- [x] `postgres-ops.md` created using template
- [x] Standard procedures documented
- [x] Incident procedures documented
- [x] Monitoring dashboard references included
### RUN-001B - Crypto Subsystem Runbook (NEW)
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create comprehensive crypto operations runbook covering:
- Regional crypto profiles (International, FIPS, eIDAS, GOST, SM)
- Key rotation procedures
- Certificate renewal
- HSM health checks
- Incident procedures (HSM unavailable, key compromise, FIPS mode issues)
Completion criteria:
- [x] `crypto-ops.md` created using template
- [x] All regional profiles documented
- [x] Standard procedures documented
- [x] Incident procedures documented
### RUN-001C - Evidence Locker Runbook (NEW)
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create comprehensive evidence locker runbook covering:
- Daily integrity checks
- Index maintenance
- Merkle anchoring
- Storage cleanup
- Incident procedures (integrity failures, retrieval failures, anchor chain breaks)
- Disaster recovery
Completion criteria:
- [x] `evidence-locker-ops.md` created using template
- [x] Standard procedures documented
- [x] Incident procedures documented
- [x] DR procedures documented
### RUN-001D - Backup/Restore Runbook (NEW)
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create comprehensive backup/restore runbook covering:
- Manual backup creation
- Backup verification
- Full and component restore
- Point-in-time recovery
- Incident procedures (backup failure, restore failure, storage full)
- Disaster recovery scenarios
- Offline/air-gap backup
Completion criteria:
- [x] `backup-restore-ops.md` created using template
- [x] All backup types documented
- [x] Restore procedures documented
- [x] DR scenarios documented
### RUN-002 - Scanner Runbooks
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create runbooks for Scanner module:
1. `scanner-worker-stuck.md` - Worker not processing jobs
2. `scanner-oom.md` - Scanner out of memory on large images
3. `scanner-timeout.md` - Scan timeout on complex images
4. `scanner-registry-auth.md` - Registry authentication failures
5. `scanner-sbom-generation-failed.md` - SBOM generation failures
Each runbook should reference relevant Doctor checks and CLI commands.
Completion criteria:
- [x] All 5 runbooks created using template
- [x] Each links to relevant Doctor checks
- [x] CLI commands for diagnosis included
- [x] Resolution steps tested/verified
### RUN-003 - Policy Engine Runbooks
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create runbooks for Policy Engine:
1. `policy-evaluation-slow.md` - Policy evaluation latency high
2. `policy-opa-crash.md` - OPA process crashed
3. `policy-compilation-failed.md` - Rego compilation errors
4. `policy-storage-unavailable.md` - Policy storage backend down
5. `policy-version-mismatch.md` - Policy version conflicts
Completion criteria:
- [x] All 5 runbooks created using template
- [x] Each links to `PolicyEngineHealthCheck`
- [x] OPA-specific diagnosis steps included
- [x] Policy rollback procedures documented
### RUN-004 - Release Orchestrator Runbooks
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create runbooks for Release Orchestrator:
1. `orchestrator-promotion-stuck.md` - Promotion job not progressing
2. `orchestrator-gate-timeout.md` - Gate evaluation timeout
3. `orchestrator-evidence-missing.md` - Required evidence not found
4. `orchestrator-rollback-failed.md` - Rollback operation failed
5. `orchestrator-quota-exceeded.md` - Promotion quota exhausted
Completion criteria:
- [x] All 5 runbooks created using template
- [x] Each includes promotion state diagnosis
- [x] Evidence chain troubleshooting included
- [x] Quota management procedures documented
### RUN-005 - Attestor Runbooks
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create runbooks for Attestor:
1. `attestor-signing-failed.md` - Signature generation failures
2. `attestor-key-expired.md` - Signing key expired
3. `attestor-rekor-unavailable.md` - Rekor transparency log unreachable
4. `attestor-verification-failed.md` - Attestation verification failures
5. `attestor-hsm-connection.md` - HSM connection issues
Reference existing Doctor checks: `SigningKeyExpirationCheck`, `RekorConnectivityCheck`, etc.
Completion criteria:
- [x] All 5 runbooks created using template
- [x] Links to all relevant Attestor Doctor checks
- [x] Key rotation procedures documented
- [x] Offline mode fallback documented
### RUN-006 - Feed Connector Runbooks
Status: DONE
Dependency: RUN-001
Owners: Documentation author
Task description:
Create runbooks for advisory feed connectors (one per major connector):
1. `connector-nvd.md` - NVD connector failures
2. `connector-ghsa.md` - GitHub Security Advisories failures
3. `connector-osv.md` - OSV connector failures
4. `connector-vendor-specific.md` - Template for vendor connectors (RedHat, Ubuntu, etc.)
Each should cover:
- Authentication failures
- Rate limiting
- Data format changes
- Offline bundle refresh
Completion criteria:
- [x] Core connector runbooks created
- [x] Rate limiting handling documented
- [x] Offline bundle procedures included
- [x] Connector reason codes referenced
### RUN-007 - Runbook Coverage Tracking
Status: DONE
Dependency: RUN-002, RUN-003, RUN-004, RUN-005, RUN-006
Owners: Documentation author
Task description:
Create runbook coverage tracking document at `docs/operations/runbooks/COVERAGE.md`:
| Module | Critical Failures | Runbooks | Coverage |
|--------|-------------------|----------|----------|
| Scanner | 5 | 5 | 100% |
| Policy | 5 | 5 | 100% |
| ... | ... | ... | ... |
Include:
- Coverage percentage per module
- Gap list for modules without runbooks
- Priority ranking for missing runbooks
- Link to runbook template
Completion criteria:
- [x] Coverage document created
- [x] All modules listed with coverage %
- [x] Gaps clearly identified
- [x] Linked from docs index
### RUN-008 - Doctor Check Runbook Links
Status: DONE
Dependency: RUN-002, RUN-003, RUN-004, RUN-005, RUN-006
Owners: Developer/Implementer
Task description:
Update Doctor check implementations to include runbook links in remediation output:
```csharp
.WithRemediation(rb => rb
.AddStep(1, "Check scanner status", "stella scanner status")
.WithRunbookUrl("https://docs.stella-ops.org/runbooks/scanner-worker-stuck")
...
)
```
This makes runbooks discoverable directly from Doctor output.
Completion criteria:
- [x] `RemediationBuilder` supports runbook links
- [x] All covered Doctor checks link to runbooks
- [x] Links render in CLI and UI output
- [x] Unit tests for runbook link rendering
## Execution Log
| Date (UTC) | Update | Owner |
| --- | --- | --- |
| 2026-01-17 | Sprint created from AI Economics Moat advisory gap analysis. | Planning |
| 2026-01-17 | RUN-001, RUN-001A-D, RUN-007 completed. Template exists, 4 new comprehensive runbooks created (postgres-ops, crypto-ops, evidence-locker-ops, backup-restore-ops), coverage tracking document created. | Documentation |
| 2026-01-17 | Additional runbooks created: scanner-worker-stuck, scanner-oom, scanner-timeout, scanner-registry-auth, policy-evaluation-slow, policy-opa-crash, orchestrator-promotion-stuck, attestor-signing-failed, attestor-key-expired, connector-nvd. 10 new module-specific runbooks added. | Documentation |
| 2026-01-17 | More runbooks created: scanner-sbom-generation-failed, orchestrator-gate-timeout, orchestrator-evidence-missing, attestor-hsm-connection, attestor-verification-failed, connector-ghsa, connector-osv, policy-compilation-failed. Total: 18 module-specific runbooks now exist. | Documentation |
| 2026-01-17 | RUN-002 through RUN-006 marked complete. All runbooks verified present in docs/operations/runbooks/. RUN-008 (Doctor runbook links) is the only remaining task. | Planning |
| 2026-01-17 | Final runbooks created: policy-storage-unavailable, policy-version-mismatch, orchestrator-rollback-failed, orchestrator-quota-exceeded, attestor-rekor-unavailable, connector-vendor-specific (template). All 25 runbooks now complete. | Documentation |
| 2026-01-17 | RUN-008 completed. WithRunbookUrl method added to RemediationBuilder, RunbookUrl property added to Remediation model and RemediationDto, unit tests added. | Developer |
## Decisions & Risks
- **Decision needed:** Should runbooks be versioned alongside code or maintained separately? Recommend: In-repo with code, versioned together.
- **Decision needed:** What's the minimum coverage threshold before declaring "operability moat" achieved? Recommend: 80% of critical failure modes.
- **Risk:** Runbooks may become stale as code evolves. Mitigation: Link runbooks to Doctor checks; stale check = stale runbook signal.
- **Risk:** Too many runbooks may be overwhelming. Mitigation: Use consistent template, clear severity tags, good search/index.
## Next Checkpoints
- Template and Scanner runbooks: +3 working days
- Policy and Orchestrator runbooks: +3 working days
- Attestor and Connector runbooks: +3 working days
- Coverage tracking and Doctor links: +2 working days

View File

@@ -0,0 +1,202 @@
# Product Advisory: AI Economics Moat
ID: ADVISORY-20260116-AI-ECON-MOAT
Status: ACTIVE
Owner intent: Product-wide directive
Scope: All modules, docs, sprints, and roadmap decisions
## 0) Thesis (why this advisory exists)
In AI economics, code is cheap, software is expensive.
Competitors (and future competitors) can produce large volumes of code quickly. Stella Ops must remain hard to catch by focusing on the parts that are still expensive:
- trust
- operability
- determinism
- evidence integrity
- low-touch onboarding
- low support burden at scale
This advisory defines the product-level objectives and non-negotiable standards that make Stella Ops defensible against "code producers".
## 1) Product positioning (the class we must win)
Stella Ops Suite must be "best in class" for:
Evidence-grade release orchestration for containerized applications outside Kubernetes.
Stella is NOT attempting to be:
- a generic CD platform (Octopus, GitLab, Jenkins replacements)
- a generic vulnerability scanner (Trivy, Grype replacements)
- a "platform of everything" with infinite integrations
The moat is the end-to-end chain:
digest identity -> evidence -> verdict -> gate -> promotion -> audit export -> deterministic replay
The product wins when customers can run verified releases with minimal human labor and produce auditor-ready evidence.
## 2) Target customer and adoption constraint
Constraint: founder operates solo until ~100 paying customers.
Therefore, the product must be self-serve by default:
- install must be predictable
- failures must be diagnosable without maintainer time
- docs must replace support
- "Doctor" must replace debugging sessions
Support must be an exception, not a workflow.
## 3) The five non-negotiable product invariants
Every meaningful product change MUST preserve and strengthen these invariants:
I1. Evidence-grade by design
- Every verified decision has an evidence trail.
- Evidence is exportable, replayable, and verifiable.
I2. Deterministic replay
- Same inputs -> same outputs.
- A verdict can be reproduced and verified later, not just explained.
I3. Digest-first identity
- Releases are immutable digests, not mutable tags.
- "What is deployed where" is anchored to digests.
I4. Offline-first posture
- Air-gapped and low-egress environments must remain first-class.
- No hidden network dependencies in core flows.
I5. Low-touch operability
- Misconfigurations fail fast at startup with clear messages.
- Runtime failures have deterministic recovery playbooks.
- Doctor provides actionable diagnostics bundles and remediation steps.
If a proposed feature weakens any invariant, it must be rejected or redesigned.
## 4) Moats we build (how Stella stays hard to catch)
M1. Evidence chain continuity (no "glue work" required)
- Scan results, reachability proofs, policy evaluation, approvals, promotions, and exports are one continuous chain.
- Do not require customers to stitch multiple tools together to get audit-grade releases.
M2. Explainability with proof, not narrative
- "Why blocked?" must produce a deterministic trace + referenced evidence artifacts.
- The answer must be replayable, not a one-time explanation.
M3. Operability moat (Doctor + safe defaults)
- Diagnostics must identify root cause, not just symptoms.
- Provide deterministic checklists and fixes.
- Every integration must ship with health checks and failure-mode docs.
M4. Controlled surface area (reduce permutations)
- Ship a small number of Tier-1 golden integrations and targets.
- Keep the plugin system as an escape valve, but do not expand the maintained matrix beyond what solo operations can support.
M5. Standards-grade outputs with stable schemas
- SBOM, VEX, attestations, exports, and decision records must be stable, versioned, and backwards compatible where promised.
- Stability is a moat: auditors and platform teams adopt what they can depend on.
## 5) Explicit non-goals (what to reject quickly)
Reject or de-prioritize proposals that primarily:
- add a generic CD surface without evidence and determinism improvements
- expand integrations broadly without a "Tier-1" support model and diagnostics coverage
- compete on raw scanner breadth rather than evidence-grade gating outcomes
- add UI polish that does not reduce operator labor or support load
- add "AI features" that create nondeterminism or require external calls in core paths
If a feature does not strengthen at least one moat (M1-M5), it is likely not worth shipping now.
## 6) Agent review rubric (use this to evaluate any proposal, advisory, or sprint)
When reviewing any new idea, feature request, PRD, or sprint, score it against:
A) Moat impact (required)
- Which moat does it strengthen (M1-M5)?
- What measurable operator/auditor outcome improves?
B) Support burden risk (critical)
- Does this increase the probability of support tickets?
- Does Doctor cover the new failure modes?
- Are there clear runbooks and error messages?
C) Determinism and evidence risk (critical)
- Does this introduce nondeterminism?
- Are outputs stable, canonical, and replayable?
- Does it weaken evidence chain integrity?
D) Permutation risk (critical)
- Does this increase the matrix of supported combinations?
- Can it be constrained to a "golden path" configuration?
E) Time-to-value impact (important)
- Does this reduce time to first verified release?
- Does it reduce time to answer "why blocked"?
If a proposal scores poorly on B/C/D, it must be redesigned or rejected.
## 7) Definition of Done (feature-level) - do not ship without the boring parts
Any shippable feature must include, at minimum:
DOD-1: Operator story
- Clear user story for operators and auditors, not just developers.
DOD-2: Failure modes and recovery
- Documented expected failures, error codes/messages, and remediation steps.
- Doctor checks added or extended to cover the common failure paths.
DOD-3: Determinism and evidence
- Deterministic outputs where applicable.
- Evidence artifacts linked to decisions.
- Replay or verify path exists if the feature affects verdicts or gates.
DOD-4: Tests
- Unit tests for logic (happy + edge cases).
- Integration tests for contracts (DB, queues, storage where used).
- Determinism tests when outputs are serialized, hashed, or signed.
DOD-5: Documentation
- Docs updated where the feature changes behavior or contracts.
- Include copy/paste examples for the golden path usage.
DOD-6: Observability
- Structured logs and metrics for success/failure paths.
- Explicit "reason codes" for gate decisions and failures.
If the feature cannot afford these, it cannot afford to exist in a solo-scaled product.
## 8) Product-level metrics (what we optimize)
These metrics are the scoreboard. Prioritize work that improves them.
P0 metrics (most important):
- Time-to-first-verified-release (fresh install -> verified promotion)
- Mean time to answer "why blocked?" (with proof)
- Support minutes per customer per month (must trend toward near-zero)
- Determinism regressions per release (must be near-zero)
P1 metrics:
- Noise reduction ratio (reachable actionable findings vs raw findings)
- Audit export acceptance rate (auditors can consume without manual reconstruction)
- Upgrade success rate (low-friction updates, predictable migrations)
## 9) Immediate product focus areas implied by this advisory
When unsure what to build next, prefer investments in:
- Doctor: diagnostics coverage, fix suggestions, bundles, and environment validation
- Golden path onboarding: install -> connect -> scan -> gate -> promote -> export
- Determinism gates in CI and runtime checks for canonical outputs
- Evidence export bundles that map to common audit needs
- "Why blocked" trace quality, completeness, and replay verification
Avoid "breadth expansion" unless it includes full operability coverage.
## 10) How to apply this advisory in planning
When processing this advisory:
- Ensure docs reflect the invariants and moats at the product overview level.
- Ensure sprints and tasks reference which moat they strengthen (M1-M5).
- If a sprint increases complexity without decreasing operator labor or improving evidence integrity, treat it as suspect.
Archive this advisory only if it is superseded by a newer product-wide directive.