Files
git.stella-ops.org/docs/ui/runs.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

8.8 KiB

StellaOps Console - Runs Workspace

Audience: Scheduler Guild, Console UX, operators, support engineers.
Scope: Runs dashboard, live progress, queue management, diffs, retries, evidence downloads, observability, troubleshooting, and offline behaviour (Sprint 23).

The Runs workspace surfaces Scheduler activity across tenants: upcoming schedules, active runs, progress, deltas, and evidence bundles. It helps operators monitor backlog, drill into run segments, and recover from failures without leaving the console.


1. Access and prerequisites

  • Route: /console/runs (list) with detail drawer /console/runs/:runId. SSE stream at /console/runs/:runId/stream.
  • Scopes: runs.read (baseline), runs.manage (cancel/retry), policy:runs (view policy deltas), downloads.read (evidence bundles).
  • Dependencies: Scheduler WebService (/runs, /schedules, /preview), Scheduler Worker event feeds, Policy Engine run summaries, Scanner WebService evidence endpoints.
  • Feature flags: runs.dashboard.enabled, runs.sse.enabled, runs.retry.enabled, runs.evidenceBundles.
  • Tenancy: Tenant selector filters list; cross-tenant admins can pin multiple tenants side-by-side (split view).

2. Layout overview

+-------------------------------------------------------------------+
| Header: Tenant badge - schedule selector - backlog metrics        |
+-------------------------------------------------------------------+
| Cards: Active runs - Queue depth - New findings - KEV deltas      |
+-------------------------------------------------------------------+
| Tabs: Active | Completed | Scheduled | Failures                   |
+-------------------------------------------------------------------+
| Runs table (virtualised)                                          |
|  Columns: Run ID | Trigger | State | Progress | Duration | Deltas |
+-------------------------------------------------------------------+
| Detail drawer: Summary | Segments | Deltas | Evidence | Logs      |
+-------------------------------------------------------------------+

The header integrates the status ticker to show ingestion deltas and planner heartbeat.


3. Runs table

Column Description
Run ID Deterministic identifier (run:<tenant>:<timestamp>:<nonce>). Clicking opens detail drawer.
Trigger cron, manual, feedser, vexer, policy, content-refresh. Tooltip lists schedule and initiator.
State Badges: planning, queued, running, completed, cancelled, error. Errors include error code (e.g., ERR_RUN_005).
Progress Percentage + processed/total candidates. SSE updates increment in real time.
Duration Elapsed time (auto-updating). Completed runs show total duration; running runs show timer.
Deltas Count of findings deltas (+critical, +high, -quieted, etc.). Tooltip expands severity breakdown.

Row badges include KEV first, Content refresh, Policy promotion follow-up, and Retry. Selecting multiple rows enables bulk downloads and exports.

Filters: trigger type, state, schedule, severity impact (critical/high), policy revision, timeframe, planner shard, error code.


4. Detail drawer

Sections:

  1. Summary - run metadata (tenant, trigger, linked schedule, planner shard count, started/finished timestamps, correlation ID).
  2. Progress - segmented progress bar (planner, queue, execution, post-processing). Real-time updates via SSE; includes throughput (targets per minute).
  3. Segments - table of run segments with state, target count, executor, retry count. Operators can retry failed segments individually (requires runs.manage).
  4. Deltas - summary of findings changes (new findings, resolved findings, severity shifts, KEV additions). Links to Findings view filtered by run ID.
  5. Evidence - links to evidence bundles (JSON manifest, DSSE attestation), policy run records, and explain bundles. Download buttons use /console/exports orchestration.
  6. Logs - last 50 structured log entries with severity, message, correlation ID; scroll-to-live for streaming logs. Open in logs copies query for external log tooling.

5. Queue and schedule management

  • Schedule side panel lists upcoming jobs with cron expressions, time zones, and enable toggles.
  • Queue depth chart shows current backlog per tenant and schedule (planner backlog, executor backlog).
  • "Preview impact" button opens modal for manual run planning (purls or vuln IDs) and shows impacted image count before launch. CLI parity: stella runs preview --tenant <id> --file keys.json.
  • Manual run form allows selecting mode (analysis-only, content-refresh), scope, and optional policy snapshot.
  • Pausing a schedule requires confirmation; UI displays earliest next run after resume.

6. Live updates and SSE stream

  • SSE endpoint /console/runs/{id}/stream streams JSON events (stateChanged, segmentProgress, deltaSummary, log). UI reconnects with exponential backoff and heartbeat.
  • Global ticker shows planner heartbeat age; banner warns after 90 seconds of silence.
  • Offline mode disables SSE and falls back to polling every 30 seconds.

7. Retry and remediation

  • Failed segments show retry button; UI displays reason and cooldown timers. Retry actions are scope-gated and logged.
  • Full run retry resets segments while preserving original run metadata; new run ID references previous run in retryOf field.
  • "Escalate to support" button opens incident template pre-filled with run context and correlation IDs.
  • Troubleshooting quick links:
    • ERR_RUN_001 (planner lock)
    • ERR_RUN_005 (Scanner timeout)
    • ERR_RUN_009 (impact index stale)
      Each link points to corresponding runbook sections (docs/modules/scheduler/operations/worker.md).
  • CLI parity: stella runs retry --run <id>, stella runs cancel --run <id>.

8. Evidence downloads

  • Evidence tab aggregates:
    • Policy run summary (/policy/runs/{id})
    • Findings delta CSV (/downloads/findings/{runId}.csv)
    • Scanner evidence bundle (compressed JSON with manifest)
  • Downloads show size, hash, signature status.
  • "Bundle for offline" packages all evidence into single tarball with manifest/digest; UI notes CLI parity (stella runs export --run <id> --bundle).
  • Completed bundles stored in Downloads workspace for reuse (links provided).

9. Observability

  • Metrics cards: scheduler_queue_depth, scheduler_runs_active, scheduler_runs_error_total, scheduler_runs_duration_seconds.
  • Trend charts: queue depth (last 24h), runs per trigger, average duration, determinism score.
  • Alert banners: planner lag > SLA, queue depth > threshold, repeated error codes.
  • Telemetry panel lists latest events (e.g., scheduler.run.started, scheduler.run.completed, scheduler.run.failed).

10. Offline and air-gap behaviour

  • Offline banner highlights snapshot timestamp and indicates SSE disabled.
  • Manual run form switches to generate CLI script for offline execution (stella runs submit --bundle <file>).
  • Evidence download buttons output local paths; UI reminds to copy to removable media.
  • Queue charts use snapshot data; manual refresh button loads latest records from Offline Kit.
  • Tenants absent from snapshot hidden to avoid partial data.

11. Screenshot coordination

  • Placeholders:
    • ![Runs dashboard placeholder](../assets/ui/runs/dashboard-placeholder.png)
    • ![Run detail placeholder](../assets/ui/runs/detail-placeholder.png)
  • Coordinate with Scheduler Guild for updated screenshots after Sprint 23 UI stabilises (tracked in #console-screenshots, entry 2025-10-26).

12. References

  • /docs/ui/console-overview.md - shell, SSE ticker.
  • /docs/ui/navigation.md - route map and deep links.
  • /docs/ui/findings.md - findings filtered by run.
  • /docs/ui/downloads.md - download manager, export retention, CLI parity.
  • /docs/modules/scheduler/architecture.md - scheduler architecture and data model.
  • /docs/policy/runs.md - policy run integration.
  • /docs/modules/cli/guides/policy.md and /docs/modules/cli/guides/policy.md section 5 for CLI parity (runs commands pending).
  • /docs/modules/scheduler/operations/worker.md - troubleshooting.

13. Compliance checklist

  • Runs table columns, filters, and states described.
  • Detail drawer sections documented (segments, deltas, evidence, logs).
  • Queue management, manual run, and preview coverage included.
  • SSE and live update behaviour detailed.
  • Retry, remediation, and runbook references provided.
  • Evidence downloads and bundle workflows documented with CLI parity.
  • Offline behaviour and screenshot coordination recorded.
  • References validated.

Last updated: 2025-10-26 (Sprint 23).