Files
git.stella-ops.org/docs/task-packs/runbook.md
master 4e3e575db5 feat: Implement console session management with tenant and profile handling
- Add ConsoleSessionStore for managing console session state including tenants, profile, and token information.
- Create OperatorContextService to manage operator context for orchestrator actions.
- Implement OperatorMetadataInterceptor to enrich HTTP requests with operator context metadata.
- Develop ConsoleProfileComponent to display user profile and session details, including tenant information and access tokens.
- Add corresponding HTML and SCSS for ConsoleProfileComponent to enhance UI presentation.
- Write unit tests for ConsoleProfileComponent to ensure correct rendering and functionality.
2025-10-28 09:59:09 +02:00

163 lines
6.7 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

> **Imposed rule:** Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
# Task Pack Operations Runbook
This runbook guides SREs and on-call engineers through executing, monitoring, and troubleshooting Task Packs using the Task Runner service, Packs Registry, and StellaOps CLI. It aligns with Sprint43 deliverables (approvals workflow, notifications, chaos resilience).
---
## 1·Quick Reference
| Action | Command / UI | Notes |
|--------|--------------|-------|
| Validate pack | `stella pack validate --bundle <file>` | Run before publishing or importing. |
| Plan pack run | `stella pack plan --inputs inputs.json` | Outputs plan hash, required approvals, secret summary. |
| Execute pack | `stella pack run --pack <id>:<version>` | Streams logs; prompts for secrets/approvals if allowed. |
| Approve gate | Console notifications or `stella pack approve --run <id> --gate <gate>` | Requires `Packs.Approve`. |
| View run | Console `/console/packs/runs/:id` or `stella pack runs show <id>` | SSE stream available for live status. |
| Export evidence | `stella pack runs export --run <id>` | Produces bundle with plan, logs, artifacts, attestations. |
---
## 2·Run Lifecycle
1. **Submission**
- CLI/Orchestrator submits run with inputs, pack version, tenant context.
- Task Runner validates pack hash, scopes, sealed-mode constraints.
2. **Plan & Simulation**
- Runner caches plan graph; optional simulation diff recorded.
3. **Approvals**
- Gates emit notifications (`NOTIFY-SVC-40-001`).
- Approvers can approve/resume via CLI, Console, or API.
4. **Execution**
- Steps executed per plan (sequential/parallel).
- Logs streamed via SSE (`/task-runner/runs/{id}/logs`).
5. **Evidence & Attestation**
- On completion, DSSE attestation + evidence bundle stored.
- Exports available via Export Center.
6. **Cleanup**
- Artifacts retained per retention policy (default 30d).
- Mirror pack run manifest to Offline Kit if configured.
---
## 3·Monitoring & Telemetry
- **Metrics dashboards:** `task-runner` Grafana board.
- `pack_run_active` active runs per tenant.
- `pack_step_duration_seconds` histograms per step type.
- `pack_gate_wait_seconds` approval wait time (alert >30m).
- `pack_run_success_ratio` success vs failure rate.
- **Logs:** Search by `runId`, `packId`, `tenant`, `stepId`.
- **Traces:** Query `taskrunner.run` span in Tempo/Jaeger.
- **Notifications:** Subscribe to `pack.run.*` topics via Notifier for Slack/email/PagerDuty hooks.
Observability configuration referenced in Task Runner tasks (OBS-50-001..55-001).
---
## 4·Approvals Workflow
- Approvals may be requested via Console banner, CLI prompt, or email/Slack.
- Approver roles: `Packs.Approve` + tenant membership.
- CLI command:
```bash
stella pack approve \
--run run:tenant:timestamp \
--gate security-review \
--comment "Validated remediation scope; proceeding."
```
- Auto-expiry triggers run cancellation (configurable per gate).
- Approval events logged and included in evidence bundle.
---
## 5·Secrets Handling
- Secrets retrieved via Authority secure channel or CLI profile.
- Task Runner injects secrets into isolated environment variables or temp files (auto-shredded).
- Logs redact secrets; evidence bundles include only secret metadata (name, scope, last four characters).
- For sealed mode, secrets must originate from sealed vault (configured via `TASKRUNNER_SEALED_VAULT_URL`).
---
## 6·Failure Recovery
| Scenario | Symptom | Resolution |
|----------|---------|------------|
| **Plan hash mismatch** | Run aborted with `ERR_PACK_HASH_MISMATCH`. | Re-run `stella pack plan`; ensure pack not modified post-plan. |
| **Approval timeout** | `ERR_PACK_APPROVAL_TIMEOUT`. | Requeue run with extended TTL or escalate to approver; verify notifications delivered. |
| **Secret missing** | Run fails at injection step. | Provide secret via CLI (`--secrets`) or configure profile; check Authority scope. |
| **Network blocked (sealed)** | `ERR_PACK_NETWORK_BLOCKED`. | Update pack to avoid external calls or whitelist domain via AirGap policy. |
| **Artifact upload failure** | Evidence missing, logs show storage errors. | Retry run with `--resume` (if supported); verify object storage health. |
| **Runner chaos trigger** | Run paused with chaos event note. | Review chaos test plan; resume if acceptable or cancel run. |
`stella pack runs resume --run <id>` resumes paused runs post-remediation (approvals or transient failures).
---
## 7·Chaos & Resilience
- Chaos hooks pause runs, drop network, or delay approvals to test resilience.
- Track chaos events via `pack.chaos.injected` timeline entries.
- Post-chaos, ensure metrics return to baseline; record findings in Ops log.
---
## 8·Offline & Air-Gapped Execution
- Use `stella pack mirror pull` to import packs into sealed registry.
- CLI caches bundles under `~/.stella/packs/` for offline runs.
- Approvals require offline process:
- Generate approval request bundle (`stella pack approve --offline-request`).
- Approver signs bundle using offline CLI.
- Import approval via `stella pack approve --offline-response`.
- Evidence bundles exported to removable media; verify checksums before upload to online systems.
---
## 9·Runbooks for Common Packs
Maintain per-pack playbooks in `docs/task-packs/runbook/<pack-name>.md`. Include:
- Purpose and scope.
- Required inputs and secrets.
- Approval stakeholders.
- Pre-checks and post-checks.
- Rollback procedures.
The Docs Guild can use this root runbook as a template.
---
## 10·Escalation Matrix
| Issue | Primary | Secondary | Notes |
|-------|---------|-----------|-------|
| Pack validation errors | DevEx/CLI Guild | Task Runner Guild | Provide pack bundle + validation output. |
| Approval pipeline failure | Task Runner Guild | Authority Core | Confirm scope/role mapping. |
| Registry outage | Packs Registry Guild | DevOps Guild | Use mirror fallback if possible. |
| Evidence integrity issues | Evidence Locker Guild | Security Guild | Validate DSSE attestations, escalate if tampered. |
Escalations must include run ID, tenant, pack version, plan hash, and timestamps.
---
## 11·Compliance Checklist
- [ ] Run lifecycle documented (submission → evidence).
- [ ] Monitoring metrics, logs, traces, and notifications captured.
- [ ] Approvals workflow instructions provided (CLI + Console).
- [ ] Secret handling, sealed-mode constraints, and offline process described.
- [ ] Failure scenarios + recovery steps listed.
- [ ] Chaos/resilience guidance included.
- [ ] Escalation matrix defined.
- [ ] Imposed rule reminder included at top.
---
*Last updated: 2025-10-27 (Sprint43).*