preparation for ui re-shelling

2026-02-18 23:03:07 +02:00
parent cb3e361fcf
commit c2f13fe588
46 changed files with 16727 additions and 0 deletions
--- a/docs/modules/ui/v2-rewire/pack-14.md
+++ b/docs/modules/ui/v2-rewire/pack-14.md
@@ -0,0 +1,482 @@
+## Pack 14 — Release Run / Deployment Timeline (workflow checkpoints, logs, rollback, evidence capture, replay/verify)
+
+This pack adds the **“run view”** that ties together everything Stella Ops promises: *promote by digest, explain every decision, evidence-backed audit, deterministic replay* — without turning reachability into a top-level area.
+
+---
+
+# 14.1 Menu graph (Mermaid) — where “Release Run” sits in the IA
+
+```mermaid
+flowchart TD
+ROOT[Stella Ops Console] --> REL[Releases]
+ROOT --> APPR[Approvals]
+ROOT --> EVID[Evidence]
+ROOT --> OPS[Operations]
+ROOT --> RC[Release Control (ROOT)]
+ROOT --> INT[Integrations]
+ROOT --> SEC[Security]
+
+REL --> REL_LIST[Releases (Promotions)]
+REL_LIST --> PROMO_DETAIL[Promotion Detail]
+PROMO_DETAIL --> RUN_TAB[Run / Timeline]
+RUN_TAB --> STEP_DETAIL[Step Detail: logs + artifacts + evidence]
+RUN_TAB --> ROLLBACK[Rollback / Re-run]
+RUN_TAB --> SCHEDULE[Schedule / Automation]
+
+STEP_DETAIL -. export evidence .-> EVID
+STEP_DETAIL -. replay policy .-> EVID
+RUN_TAB -. ops health .-> OPS
+
+EVID --> PKT[Packets]
+EVID --> CHAIN[Proof Chains]
+EVID --> REPLAY[Replay/Verify]
+EVID --> EXPORT[Export Center]
+EVID --> BUNDLES[Evidence Bundles]
+
+OPS --> ORCH[Orchestrator]
+OPS --> SCHED[Scheduler Runs]
+OPS --> DLQ[Dead Letter]
+OPS --> FEEDS[Feeds + AirGap Ops]
+OPS --> HEALTH[Platform Health]
+
+RUN_TAB -. links to .-> ORCH
+RUN_TAB -. links to .-> SCHED
+RUN_TAB -. links to .-> FEEDS
+RUN_TAB -. links to .-> HEALTH
+
+PROMO_DETAIL -. findings snapshot .-> SEC
+PROMO_DETAIL -. env inputs .-> RC
+PROMO_DETAIL -. secrets/providers .-> INT
+```
+
+---
+
+# 14.2 Run lifecycle graph (Mermaid) — promotion execution stages + checkpoints
+
+```mermaid
+flowchart LR
+A[Promotion Created] --> B[Inputs Materialized]
+B --> C[Policy Gate Eval]
+C --> D{Approval Required?}
+D -- yes --> E[Approval Decision]
+D -- no --> F[Deploy Workflow Start]
+
+E --> F
+F --> G[Canary 10%]
+G --> H{SLO/Health OK?}
+H -- no --> R[Auto-Rollback / Pause]
+H -- yes --> I[Canary 50%]
+I --> J{SLO/Health OK?}
+J -- no --> R
+J -- yes --> K[100% Rollout]
+K --> L[Post-Deploy Verify]
+L --> M[Finalize + Seal Evidence]
+M --> N[Promotion Complete]
+
+%% Evidence capture points
+C -. DSSE policy decision .-> EV[Evidence Pack]
+F -. provenance/attestations .-> EV
+L -. runtime reachability snapshot .-> EV
+M -. Rekor/tlog receipts .-> EV
+```
+
+---
+
+# 14.3 Screen — Run / Timeline (Promotion Run)
+
+### Formerly (where it lived pre-redesign)
+
+Pieces existed but were **fragmented**:
+
+* **Control Plane** dashboard showed *Active Deployments* (high-level only).
+* **Operations → Orchestrator** (jobs access) and **Operations → Scheduler** (runs) were operational but not “release narrative”.
+* Evidence was in **Evidence → Packets / Proof Chains / Export**, but not tied to a run timeline.
+* Any detailed logs typically lived outside Stella (CI/CD, deploy system, cluster logs).
+
+### Why changed like this
+
+* A release promotion must be **auditable as a single storyline**:
+
+  * what happened,
+  * when,
+  * what data it used,
+  * what it decided,
+  * what evidence was sealed at each checkpoint,
+  * and what actions are safe now (pause, rollback, replay).
+* This screen becomes the **single pane** that links out to specialized areas (Ops, Evidence), instead of forcing users to hunt.
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart TD
+A[Run / Timeline] --> B[Stage timeline with checkpoints]
+A --> C[Current status + next step]
+A --> D[Links to logs, artifacts, evidence]
+A --> E[Actions: pause/retry/rollback]
+A --> F[Data health banner: feeds/jobs/integrations]
+A --> G[Drill into Step Detail]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Promotion Run / Timeline                                                                       │
+│ Legacy name/location: No single screen. Pieces were Control Plane "Active Deployments" + Ops. │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Promotion: Platform Release 1.3.0-rc1  manifest sha256:beef...                                 │
+│ Target: EU-West / eu-stage → eu-prod         Workflow: Canary 10→50→100                         │
+│ Status: RUNNING (Canary 10%)            Started: Feb 18, 08:30                                  │
+│ Data health: WARN — NVD stale 3h | Rescan job failed (worker) | Jenkins degraded                │
+│ Links: [Ops Feeds] [System Jobs] [Integrations]                                                 │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Timeline (click any step)                                                                      │
+│  08:30  ✓ Inputs Materialized     (Vault/Consul resolved, 0 missing)           [View]          │
+│  08:31  ✓ Gate Eval (Policy)      PASS/WARN (reach runtime 35%)               [View]          │
+│  08:32  ✓ Approval               APPROVED by bob.smith                         [View]          │
+│  08:33  ▶ Deploy Canary 10%       RUNNING (2/10 targets healthy)               [View] [Pause] │
+│  ----    ○ Deploy Canary 50%      PENDING                                      [—]            │
+│  ----    ○ Deploy 100%            PENDING                                      [—]            │
+│  ----    ○ Post-Deploy Verify     PENDING                                      [—]            │
+│  ----    ○ Seal Evidence          PENDING                                      [—]            │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Quick actions: [Pause] [Retry step] [Rollback] [Export evidence (partial)] [Replay policy]     │
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+# 14.4 Screen — Step Detail (Logs + Artifacts + Evidence captured at that checkpoint)
+
+### Formerly
+
+* Logs: CI/CD (e.g., Jenkins), deploy agent logs, platform logs — outside Stella.
+* Evidence: visible only under **Evidence** menus and not connected to “the step that created it”.
+
+### Why changed like this
+
+* Step Detail is the “unit of explanation”.
+* Every meaningful checkpoint should show:
+
+  * **inputs** used,
+  * **outputs** produced,
+  * **logs**,
+  * **evidence items** sealed (or pending),
+  * and **links** to canonical storage (Evidence Packets / Proof Chains).
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart TD
+A[Step Detail] --> B[Overview: inputs/outputs + timestamps]
+A --> C[Logs (stream / download)]
+A --> D[Artifacts (manifests, plans, diffs)]
+A --> E[Evidence items (DSSE, receipts, proofs)]
+A --> F[Actions: retry step / mark failed / pause]
+A --> G[Jump: Evidence Packet / Proof Chain]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Step Detail: Gate Eval (Policy)                                                                │
+│ Legacy name/location: gate result surfaced loosely on Approvals; evidence elsewhere.           │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Start: 08:31  End: 08:31:12  Duration: 12s   Result: PASS (2 WARN)                             │
+│ Inputs: bundle manifest sha256:beef... | baseline Prod-EU-West | feeds: NVD stale 3h            │
+│ Outputs: policy verdict id: verdict-123 | decision digest: sha256:dd77...                       │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Tabs: [Overview] [Logs] [Artifacts] [Evidence]                                                  │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Evidence captured                                                                              │
+│  ✓ DSSE envelope: policy-decision.dsse  (digest sha256:dd77...)                                 │
+│  ✓ Rekor receipt: rekor-entry.json      (tlog index 9918271)                                    │
+│  ○ Proof chain: pending until "Seal Evidence" step                                              │
+│ Links: [Open Evidence Packet] [Open Proof Chain] [Replay this Verdict]                          │
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+# 14.5 Screen — Deploy Stage View (targets, health, checkpoints, rollback triggers)
+
+### Formerly
+
+* “Active Deployments” showed minimal progress.
+* Detailed rollout/targets health likely lived in your deploy system (outside Stella).
+* Platform Health screen exists, but not contextualized to a specific promotion.
+
+### Why changed like this
+
+* This is where “release operations” actually happens:
+
+  * show **targets** in the region/env,
+  * show **health gates** / SLO checks,
+  * show **automatic rollback triggers**,
+  * link to platform health and logs.
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart TD
+A[Deploy Stage View] --> B[Targets table (per region/env)]
+A --> C[SLO / health checks]
+A --> D[Auto-rollback rules + trigger state]
+A --> E[Actions: pause/continue/rollback]
+A --> F[Link: Platform Health]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Step Detail: Deploy Canary 10%                                                                  │
+│ Legacy name/location: Control Plane "Active Deployments" (summary only) + external deploy logs │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Stage: Canary 10%   Policy: proceed if 95% healthy for 5m, error rate < 1%                      │
+│ Current: 2/10 healthy  | Error rate: 0.4% | Latency p95: 210ms | SLO: OK                         │
+│ Auto-rollback trigger: NOT TRIGGERED                                                            │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Targets (EU-West / eu-prod)                                                                     │
+│ ┌───────────────┬───────────┬──────────┬──────────────┬───────────────┐                        │
+│ │ Target         │ Version    │ Health   │ Notes         │ Logs          │                        │
+│ ├───────────────┼───────────┼──────────┼──────────────┼───────────────┤                        │
+│ │ eu-prod-01     │ bundle@beef│ ✓        │ ok            │ [open]        │                        │
+│ │ eu-prod-02     │ bundle@beef│ ✓        │ ok            │ [open]        │                        │
+│ │ eu-prod-03     │ old        │ ○        │ pending       │ [open]        │                        │
+│ └───────────────┴───────────┴──────────┴──────────────┴───────────────┘                        │
+│ Actions: [Pause] [Continue to 50%] (disabled until criteria met) [Rollback] [Open Platform Health]│
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+# 14.6 Screen — Rollback / Re-run (safe ops controls)
+
+### Formerly
+
+* Rollback existed as a **status** (“ROLLED_BACK”) in Releases list.
+* Actual rollback execution likely happened externally or via Orchestrator privileges.
+
+### Why changed like this
+
+* Rollback must be:
+
+  * explicit,
+  * traceable,
+  * evidence-backed (what was rolled back, why, and what is the resulting state).
+* Re-run is needed for transient failures (e.g., feed sync delay, rescan job retry), but must preserve determinism (re-run should record new evidence with timestamps, and keep old evidence).
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart TD
+A[Rollback/Re-run] --> B[Select scope: step / stage / full rollback]
+A --> C[Preview impact (targets + versions)]
+A --> D[Reason + ticket]
+A --> E[Execute]
+E --> F[Run Timeline updates + evidence appended]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Rollback / Re-run                                                                              │
+│ Legacy name/location: Release status "ROLLED_BACK" existed; rollback execution was not unified │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Promotion: Platform Release 1.3.0-rc1  → EU-West/eu-prod                                       │
+│ Current stage: Canary 10% (RUNNING)                                                             │
+│                                                                                                 │
+│ Choose action:                                                                                │
+│  ( ) Re-run current step (Deploy Canary 10%)                                                    │
+│  ( ) Pause promotion                                                                           │
+│  ( ) Rollback to previously deployed bundle version (manifest sha256:prev...)                  │
+│                                                                                                 │
+│ Preview rollback impact:                                                                        │
+│  - 2 targets currently on new bundle → will revert to prev bundle                               │
+│  - 8 targets still old → unchanged                                                              │
+│                                                                                                 │
+│ Reason (required): [ incident #1234: elevated latency ]                                          │
+│ [Execute]   [Cancel]                                                                            │
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+# 14.7 Screen — Evidence Timeline (what evidence exists now vs what seals at finalize)
+
+### Formerly
+
+* Evidence existed under:
+
+  * **Evidence → Packets**
+  * **Evidence → Proof Chains**
+  * **Evidence → Export**
+  * **Evidence → Evidence Bundles**
+    …but the *relationship to the run stages* wasn’t visible.
+
+### Why changed like this
+
+* Auditors and operators need to answer:
+
+  * “What evidence is already available mid-run?”
+  * “What is pending until completion?”
+  * “What exactly was sealed and when?”
+* This is the bridge between *Ops timeline* and *audit artifacts*.
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart LR
+A[Evidence Timeline (per promotion)] --> B[Evidence items by checkpoint]
+A --> C[Open Packet]
+A --> D[Open Proof Chain]
+A --> E[Export Evidence Pack]
+A --> F[Generate Auditor Bundle]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Evidence Timeline — Promotion Run                                                              │
+│ Legacy name/location: Evidence artifacts existed, but not linked to run checkpoints             │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Checkpoint → Evidence                                                                           │
+│  Inputs Materialized                                                                            │
+│   ✓ resolved-inputs.json (hash sha256:aa11...)                                                   │
+│                                                                                                  │
+│  Gate Eval (Policy)                                                                             │
+│   ✓ policy-decision.dsse  ✓ rekor receipt  ✓ verdict-123                                         │
+│                                                                                                  │
+│  Deploy Canary 10%                                                                              │
+│   ○ deploy-attestation.dsse (pending)                                                            │
+│                                                                                                  │
+│  Seal Evidence (final)                                                                          │
+│   ○ proof-chain.json  ○ audit-pack.tar.gz  ○ evidence-bundle.zip                                 │
+│                                                                                                  │
+│ Actions: [Open Evidence Packet] [Open Proof Chain] [Export Pack (partial)] [Generate Auditor Bundle]│
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+# 14.8 Screen — Replay/Verify (contextual replay for *this run*)
+
+### Formerly
+
+* **Evidence → Replay/Verify** (“Verdict Replay”) existed as a standalone screen:
+
+  * user inputs verdict id or image reference,
+  * sees replay requests + determinism overview.
+
+### Why changed like this
+
+* Replay should be reachable from where it matters:
+
+  * a specific policy decision checkpoint in a promotion run.
+* Keep the existing Replay/Verify functionality, but add a **contextual wrapper**:
+
+  * pre-fills verdict id + bundle digest + env baseline,
+  * shows determinism status for this promotion.
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart TD
+A[Run → Replay/Verify] --> B[Pre-filled replay request]
+B --> C[Replay requests list]
+C --> D[Determinism metrics]
+D --> E[Link: Evidence → Replay/Verify canonical view]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Replay/Verify — For this Promotion                                                             │
+│ Legacy name/location: "Verdict Replay" (Evidence → Replay/Verify)                               │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Pre-filled replay request                                                                       │
+│  Verdict ID: verdict-123                                                                         │
+│  Bundle: Platform Release 1.3.0-rc1  manifest sha256:beef...                                    │
+│  Baseline: Prod-EU-West                                                                          │
+│  Reason: [ Audit verification / policy change test ]                                             │
+│ [Request Replay]                                                                                │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Recent replay requests (for this promotion)                                                     │
+│  rr-001  COMPLETED  Feb 18, 08:30  match                                                        │
+│  rr-002  RUNNING    Feb 18, 07:30                                                               │
+│ Determinism: total 2 | matching 1 | mismatches 1 | match rate 50%                               │
+│ Link: [Open canonical Replay/Verify screen]                                                     │
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+# 14.9 Screen — Schedule / Automation (promotion scheduling + link to Scheduler Runs)
+
+### Formerly
+
+* **Operations → Scheduler** existed (“Scheduler Runs”) but disconnected from promotions.
+* Release list had statuses but scheduling wasn’t first-class in the release context.
+
+### Why changed like this
+
+* Scheduling belongs to release operations, but we don’t want a new menu.
+* This screen:
+
+  * schedules this promotion (or a step),
+  * writes a scheduler job,
+  * then links to **Scheduler Runs** for execution diagnostics.
+
+### Screen graph (Mermaid)
+
+```mermaid
+flowchart LR
+A[Schedule Promotion] --> B[Choose time/window]
+A --> C[Choose constraints (feeds fresh, scans complete)]
+A --> D[Create scheduler job]
+D --> E[View Scheduler Runs]
+E --> F[Back to Run Timeline]
+```
+
+### ASCII mock
+
+```text
+┌──────────────────────────────────────────────────────────────────────────────────────────────┐
+│ Schedule Promotion                                                                             │
+│ Legacy name/location: Ops → Scheduler (runs), no promotion-level scheduling UI                 │
+├──────────────────────────────────────────────────────────────────────────────────────────────┤
+│ Promotion: Hotfix Bundle 1.2.4 → US-East/us-prod                                                │
+│                                                                                                 │
+│ Schedule: [ Feb 19, 02:00 AM ]  Window: [ 2h ]                                                  │
+│ Preconditions:                                                                                  │
+│  [x] NVD/OSV feeds fresh (< 1h)                                                                 │
+│  [x] SBOM rescans complete                                                                      │
+│  [ ] Integrations healthy (warn only)                                                           │
+│                                                                                                 │
+│ [Create Schedule]   Link: [Open Scheduler Runs]                                                 │
+└──────────────────────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Result: what you gain with Pack 14
+
+* A promotion is now a **single auditable narrative**:
+
+  * timeline + logs + checkpoints,
+  * policy decision trace,
+  * deploy stage health gates,
+  * rollback controls,
+  * evidence sealing,
+  * deterministic replay.
+* Hybrid reachability becomes a **2nd-class** signal woven into checkpoints (Policy + Post-Deploy Verify), not a top-level section.
+* Existing PoC pages remain valid, but are now **linked meaningfully** from the run storyline.
+
+---
+
+If you want the next pack: **Pack 15** will unify **Nightly Ops Report + Data Freshness** (feeds, rescans, integration degradation) into a single **Operations “Data Integrity”** view and show how it bubbles up to Dashboard/Releases/Approvals without duplicating screens.