This commit is contained in:
StellaOps Bot
2025-11-29 02:19:50 +02:00
parent 2548abc56f
commit b34f13dc03
86 changed files with 9625 additions and 640 deletions

View File

@@ -0,0 +1,447 @@
# Task Pack Orchestration and Automation
**Version:** 1.0
**Date:** 2025-11-29
**Status:** Canonical
This advisory defines the product rationale, DSL semantics, and implementation strategy for the TaskRunner module, covering pack manifest structure, execution semantics, approval workflows, and evidence capture.
---
## 1. Executive Summary
The TaskRunner provides **deterministic, auditable automation** for security workflows. Key capabilities:
- **Task Pack DSL** - Declarative YAML manifests for multi-step workflows
- **Approval Gates** - Human-in-the-loop checkpoints with Authority integration
- **Deterministic Execution** - Plan hash verification prevents runtime divergence
- **Evidence Capture** - DSSE attestations for provenance and audit
- **Air-Gap Support** - Sealed-mode validation for offline installations
---
## 2. Market Drivers
### 2.1 Target Segments
| Segment | Automation Requirements | Use Case |
|---------|------------------------|----------|
| **Enterprise Security** | Approval workflows for vulnerability remediation | Change advisory board gates |
| **DevSecOps** | CI/CD pipeline integration | Automated policy enforcement |
| **Compliance Teams** | Auditable execution with evidence | SOC 2, FedRAMP documentation |
| **MSP/MSSP** | Multi-tenant orchestration | Managed security services |
### 2.2 Competitive Positioning
Most vulnerability scanning tools lack built-in orchestration. Stella Ops differentiates with:
- **Declarative task packs** with schema validation
- **Cryptographic plan verification** (plan hash binding)
- **Native approval gates** with Authority token integration
- **Evidence attestations** for audit trails
- **Sealed-mode enforcement** for air-gapped environments
---
## 3. Technical Architecture
### 3.1 Pack Manifest Structure (v1)
```yaml
apiVersion: stellaops.io/pack.v1
kind: TaskPack
metadata:
name: vulnerability-scan-and-report
version: 1.2.0
description: Scan container, evaluate policy, generate report
tags: [security, compliance, scanning]
tenantVisibility: private
maintainers:
- name: Security Team
email: security@example.com
license: MIT
spec:
inputs:
- name: imageRef
type: string
required: true
schema:
pattern: "^[a-z0-9./-]+:[a-z0-9.-]+$"
- name: policyPack
type: string
default: "default-policy-v1"
secrets:
- name: registryCredentials
scope: scanner.read
description: Registry pull credentials
approvals:
- name: security-review
grants: ["security-lead", "ciso"]
ttlHours: 72
message: "Approve scan results before policy evaluation"
steps:
- id: scan
type: run
module: scanner/sbom-vuln
inputs:
image: "{{ inputs.imageRef }}"
outputs:
sbom: sbom.json
vulns: vulnerabilities.json
- id: review-gate
type: gate.approval
approval: security-review
dependsOn: [scan]
- id: policy-eval
type: run
module: policy/evaluate
inputs:
sbom: "{{ steps.scan.outputs.sbom }}"
vulns: "{{ steps.scan.outputs.vulns }}"
pack: "{{ inputs.policyPack }}"
dependsOn: [review-gate]
- id: generate-report
type: parallel
maxParallel: 2
steps:
- id: pdf-report
type: run
module: export/pdf
inputs:
data: "{{ steps.policy-eval.outputs.results }}"
- id: json-report
type: run
module: export/json
inputs:
data: "{{ steps.policy-eval.outputs.results }}"
dependsOn: [policy-eval]
outputs:
- name: scanReport
type: file
path: "{{ steps.generate-report.steps.pdf-report.outputs.file }}"
- name: machineReadable
type: object
value: "{{ steps.generate-report.steps.json-report.outputs.data }}"
success:
message: "Scan completed successfully"
failure:
message: "Scan failed - review logs"
retryPolicy:
maxRetries: 2
backoffSeconds: 60
```
### 3.2 Step Types
| Type | Purpose | Key Properties |
|------|---------|----------------|
| `run` | Execute module | `module`, `inputs`, `outputs` |
| `parallel` | Concurrent execution | `steps[]`, `maxParallel` |
| `map` | Iterate over list | `items`, `step`, `maxParallel` |
| `gate.approval` | Human approval checkpoint | `approval`, `timeout` |
| `gate.policy` | Policy Engine validation | `policy`, `failAction` |
### 3.3 Execution Semantics
**Plan Phase:**
1. Parse manifest and validate schema
2. Resolve input expressions
3. Build execution graph
4. Compute **canonical plan hash** (SHA-256 of normalized graph)
**Simulation Phase (Optional):**
1. Execute all steps in dry-run mode
2. Capture expected outputs
3. Store simulation results with plan hash
**Execution Phase:**
1. Verify runtime graph matches plan hash
2. Execute steps in dependency order
3. Emit progress events to Timeline
4. Capture output artifacts
**Evidence Phase:**
1. Generate DSSE attestation with plan hash
2. Include input digests and output manifests
3. Store in Evidence Locker
4. Optionally anchor to Rekor
---
## 4. Approval Workflow
### 4.1 Gate Definition
```yaml
approvals:
- name: security-review
grants: ["role/security-lead", "role/ciso"]
ttlHours: 72
message: "Review vulnerability findings before proceeding"
requiredCount: 1 # Number of approvals needed
```
### 4.2 Authority Token Contract
Approval tokens must include:
| Claim | Description |
|-------|-------------|
| `pack_run_id` | Run identifier (UUID) |
| `pack_gate_id` | Gate name from manifest |
| `pack_plan_hash` | Canonical plan hash |
| `auth_time` | Must be within 5 minutes of request |
### 4.3 CLI Approval Command
```bash
stella pack approve \
--run "run:tenant-default:20251129T120000Z" \
--gate security-review \
--pack-run-id "abc123..." \
--pack-gate-id "security-review" \
--pack-plan-hash "sha256:def456..." \
--comment "Reviewed findings, no critical issues"
```
### 4.4 Approval Events
| Event | Trigger |
|-------|---------|
| `pack.approval.requested` | Gate reached, awaiting approval |
| `pack.approval.granted` | Approval recorded |
| `pack.approval.denied` | Approval rejected |
| `pack.approval.expired` | TTL exceeded without approval |
---
## 5. Implementation Strategy
### 5.1 Phase 1: Core Execution (In Progress)
- [x] Telemetry core adoption (TASKRUN-OBS-50-001)
- [x] Metrics implementation (TASKRUN-OBS-51-001)
- [ ] Architecture/API contracts (TASKRUN-41-001) - BLOCKED
- [ ] Execution engine enhancements (TASKRUN-42-001) - BLOCKED
### 5.2 Phase 2: Approvals & Evidence (Planned)
- [ ] Timeline event emission (TASKRUN-OBS-52-001)
- [ ] Evidence locker snapshots (TASKRUN-OBS-53-001)
- [ ] DSSE attestations (TASKRUN-OBS-54-001)
- [ ] Incident mode escalations (TASKRUN-OBS-55-001)
### 5.3 Phase 3: Multi-Tenancy & Air-Gap (Planned)
- [ ] Tenant scoping and egress control (TASKRUN-TEN-48-001)
- [ ] Sealed-mode validation (TASKRUN-AIRGAP-56-001)
- [ ] Bundle ingestion for offline (TASKRUN-AIRGAP-56-002)
- [ ] Evidence capture in sealed mode (TASKRUN-AIRGAP-58-001)
---
## 6. API Surface
### 6.1 TaskRunner APIs
| Endpoint | Method | Scope | Description |
|----------|--------|-------|-------------|
| `/api/runs` | POST | `packs.run` | Submit pack run |
| `/api/runs/{runId}` | GET | `packs.read` | Get run status |
| `/api/runs/{runId}/logs` | GET | `packs.read` | Stream logs (SSE) |
| `/api/runs/{runId}/artifacts` | GET | `packs.read` | List artifacts |
| `/api/runs/{runId}/approve` | POST | `packs.approve` | Record approval |
| `/api/runs/{runId}/cancel` | POST | `packs.run` | Cancel run |
### 6.2 Packs Registry APIs
| Endpoint | Method | Scope | Description |
|----------|--------|-------|-------------|
| `/api/packs` | GET | `packs.read` | List packs |
| `/api/packs/{packId}/versions` | GET | `packs.read` | List versions |
| `/api/packs/{packId}/versions/{version}` | GET | `packs.read` | Get manifest |
| `/api/packs/{packId}/versions` | POST | `packs.write` | Publish pack |
| `/api/packs/{packId}/promote` | POST | `packs.write` | Promote channel |
### 6.3 CLI Commands
```bash
# Initialize pack scaffold
stella pack init --name my-workflow
# Validate manifest
stella pack validate pack.yaml
# Dry-run simulation
stella pack plan pack.yaml --inputs image=nginx:latest
# Execute pack
stella pack run pack.yaml --inputs image=nginx:latest
# Build distributable bundle
stella pack build pack.yaml --output my-workflow-1.0.0.tar.gz
# Sign bundle
cosign sign-blob my-workflow-1.0.0.tar.gz
# Publish to registry
stella pack push my-workflow-1.0.0.tar.gz --registry packs.example.com
# Export for offline distribution
stella pack bundle export --pack my-workflow --version 1.0.0
```
---
## 7. Storage Model
### 7.1 MongoDB Collections
**pack_runs:**
| Field | Type | Description |
|-------|------|-------------|
| `_id` | string | Run identifier |
| `planHash` | string | Canonical plan hash |
| `plan` | object | Full TaskPackPlan |
| `failurePolicy` | object | Retry/backoff config |
| `requestedAt` | date | Client request time |
| `tenantId` | string | Tenant scope |
| `steps` | array | Step execution records |
**pack_run_logs:**
| Field | Type | Description |
|-------|------|-------------|
| `runId` | string | FK to pack_runs |
| `sequence` | long | Monotonic counter |
| `timestamp` | date | Event time (UTC) |
| `level` | string | trace/debug/info/warn/error |
| `eventType` | string | Machine identifier |
| `stepId` | string | Optional step reference |
**pack_artifacts:**
| Field | Type | Description |
|-------|------|-------------|
| `runId` | string | FK to pack_runs |
| `name` | string | Output name |
| `type` | string | file/object/url |
| `storedPath` | string | Object store URI |
| `status` | string | pending/copied/materialized |
---
## 8. Evidence & Attestation
### 8.1 DSSE Attestation Structure
```json
{
"payloadType": "application/vnd.stellaops.pack-run+json",
"payload": {
"runId": "abc123...",
"packName": "vulnerability-scan-and-report",
"packVersion": "1.2.0",
"planHash": "sha256:def456...",
"inputs": {
"imageRef": { "value": "nginx:latest", "digest": "sha256:..." }
},
"outputs": [
{ "name": "scanReport", "digest": "sha256:..." }
],
"steps": [
{ "id": "scan", "status": "completed", "duration": 45.2 }
],
"completedAt": "2025-11-29T12:30:00Z"
},
"signatures": [...]
}
```
### 8.2 Evidence Bundle
Task pack runs produce evidence bundles containing:
- Pack manifest (signed)
- Input values (redacted secrets)
- Output artifacts
- Step transcripts
- DSSE attestation
---
## 9. Determinism Requirements
All TaskRunner operations must maintain determinism:
1. **Plan hash binding** - Runtime graph must match computed plan hash
2. **Stable step ordering** - Dependencies resolve deterministically
3. **Expression evaluation** - Same inputs produce same resolved values
4. **Timestamps in UTC** - All logs and events use ISO-8601 UTC
5. **Secret masking** - Secrets never appear in logs or evidence
---
## 10. RBAC & Scopes
| Scope | Purpose |
|-------|---------|
| `packs.read` | Discover/download packs |
| `packs.write` | Publish/update packs (requires signature) |
| `packs.run` | Execute packs via CLI/TaskRunner |
| `packs.approve` | Fulfill approval gates |
**Approval Token Requirements:**
- `pack_run_id`, `pack_gate_id`, `pack_plan_hash` are mandatory
- Token must be fresh (within 5-minute auth window)
---
## 11. Related Documentation
| Resource | Location |
|----------|----------|
| Task Pack specification | `docs/task-packs/spec.md` |
| Authoring guide | `docs/task-packs/authoring-guide.md` |
| Operations runbook | `docs/task-packs/runbook.md` |
| Registry architecture | `docs/task-packs/registry.md` |
| MongoDB migrations | `docs/modules/taskrunner/migrations/pack-run-collections.md` |
---
## 12. Sprint Mapping
- **Primary Sprint:** SPRINT_0157_0001_0001_taskrunner_i.md
- **Phase II:** SPRINT_0158_0001_0002_taskrunner_ii.md
- **Blockers:** SPRINT_0157_0001_0002_taskrunner_blockers.md
**Key Task IDs:**
- `TASKRUN-41-001` - Architecture/API contracts (BLOCKED)
- `TASKRUN-42-001` - Execution engine enhancements (BLOCKED)
- `TASKRUN-OBS-50-001` - Telemetry core adoption (DONE)
- `TASKRUN-OBS-51-001` - Metrics implementation (DONE)
- `TASKRUN-OBS-52-001` - Timeline events (BLOCKED)
---
## 13. Success Metrics
| Metric | Target |
|--------|--------|
| Plan hash verification | 100% match or abort |
| Approval gate response | < 5 min for high-priority |
| Evidence attestation rate | 100% of completed runs |
| Offline execution success | Works in sealed mode |
| Step execution latency | < 2s overhead per step |
---
*Last updated: 2025-11-29*