git.stella-ops.org/docs/ARCHITECTURE_SCHEDULER.md

# component_architecture_scheduler.md — **Stella Ops Scheduler** (2025Q4)

> **Scope.** Implementation‑ready architecture for **Scheduler**: a service that (1) **re‑evaluates** already‑cataloged images when intel changes (Feedser/Vexer/policy), (2) orchestrates **nightly** and **ad‑hoc** runs, (3) targets only the **impacted** images using the BOM‑Index, and (4) emits **report‑ready** events that downstream **Notify** fans out. Default mode is **analysis‑only** (no image pull); optional **content‑refresh** can be enabled per schedule.

---

## 0) Mission & boundaries

**Mission.** Keep scan results **current** without rescanning the world. When new advisories or VEX claims land, **pinpoint** affected images and ask the backend to recompute **verdicts** against the **existing SBOMs**. Surface only **meaningful deltas** to humans and ticket queues.

**Boundaries.**

* Scheduler **does not** compute SBOMs and **does not** sign. It calls Scanner/WebService’s **/reports (analysis‑only)** endpoint and lets the backend (Policy + Vexer + Feedser) decide PASS/FAIL.
* Scheduler **may** ask Scanner to **content‑refresh** selected targets (e.g., mutable tags) but the default is **no** image pull.
* Notifications are **not** sent directly; Scheduler emits events consumed by **Notify**.

---

## 1) Runtime shape & projects

```
src/
 ├─ StellaOps.Scheduler.WebService/      # REST (schedules CRUD, runs, admin)
 ├─ StellaOps.Scheduler.Worker/          # planners + runners (N replicas)
 ├─ StellaOps.Scheduler.ImpactIndex/     # purl→images inverted index (roaring bitmaps)
 ├─ StellaOps.Scheduler.Models/          # DTOs (Schedule, Run, ImpactSet, Deltas)
 ├─ StellaOps.Scheduler.Storage.Mongo/   # schedules, runs, cursors, locks
 ├─ StellaOps.Scheduler.Queue/           # Redis Streams / NATS abstraction
 ├─ StellaOps.Scheduler.Tests.*          # unit/integration/e2e
```

**Deployables**:

* **Scheduler.WebService** (stateless)
* **Scheduler.Worker** (scale‑out; planners + executors)

**Dependencies**: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Feedser, Vexer, MongoDB, Redis/NATS, (optional) Notify.

---

## 2) Core responsibilities

1. **Time‑based** runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
2. **Event‑driven** runs: react to **Feedser export** and **Vexer export** deltas (changed product keys / advisories / claims).
3. **Impact targeting**: map changes to **image sets** using a **global inverted index** built from Scanner’s per‑image **BOM‑Index** sidecars.
4. **Run planning**: shard, pace, and rate‑limit jobs to avoid thundering herds.
5. **Execution**: call Scanner **/reports (analysis‑only)** or **/scans (content‑refresh)**; aggregate **delta** results.
6. **Events**: publish `rescan.delta` and `report.ready` summaries for **Notify** & **UI**.
7. **Control plane**: CRUD schedules, **pause/resume**, dry‑run previews, audit.

---

## 3) Data model (Mongo)

**Database**: `scheduler`

* `schedules`

  ```
  { _id, tenantId, name, enabled, whenCron, timezone,
    mode: "analysis-only" | "content-refresh",
    selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
                 includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
    onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
    notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
    limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
    createdAt, updatedAt, createdBy, updatedBy }
  ```

* `runs`

  ```
  { _id, scheduleId?, tenantId, trigger: "cron|feedser|vexer|manual",
    reason?: { feedserExportId?, vexerExportId?, cursor? },
    state: "planning|queued|running|completed|error|cancelled",
    stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
    startedAt, finishedAt, error? }
  ```

* `impact_cursors`

  ```
  { _id: tenantId, feedserLastExportId, vexerLastExportId, updatedAt }
  ```

* `locks` (singleton schedulers, run leases)

* `audit` (CRUD actions, run outcomes)

**Indexes**:

* `schedules` on `{tenantId, enabled}`, `{whenCron}`.
* `runs` on `{tenantId, startedAt desc}`, `{state}`.
* TTL optional for completed runs (e.g., 180 days).

---

## 4) ImpactIndex (global inverted index)

Goal: translate **change keys** → **image sets** in **milliseconds**.

**Source**: Scanner produces per‑image **BOM‑Index** sidecars (purls, and `usedByEntrypoint` bitmaps). Scheduler ingests/refreshes them to build a **global** index.

**Representation**:

* Assign **image IDs** (dense ints) to catalog images.
* Keep **Roaring Bitmaps**:

  * `Contains[purl]        → bitmap(imageIds)`
  * `UsedBy[purl]          → bitmap(imageIds)` (subset of Contains)
* Optionally keep **Owner maps**: `{imageId → {tenantId, namespaces[], repos[]}}` for selection filters.
* Persist in RocksDB/LMDB or Redis‑modules; cache hot shards in memory; snapshot to Mongo for cold start.

**Update paths**:

* On new/updated image SBOM: **merge** per‑image set into global maps.
* On image remove/expiry: **clear** id from bitmaps.

**API (internal)**:

```csharp
IImpactIndex {
  ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
  ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Feedser)
  ImpactSet ResolveAll(Selector sel); // for nightly
}
```

**Selector filters**: tenant, namespaces, repos, labels, digest allowlists, `includeTags` patterns.

---

## 5) External interfaces (REST)

Base path: `/api/v1/scheduler` (Authority OpToks; scopes: `scheduler.read`, `scheduler.admin`).

### 5.1 Schedules CRUD

* `POST /schedules` → create
* `GET /schedules` → list (filter by tenant)
* `GET /schedules/{id}` → details + next run
* `PATCH /schedules/{id}` → pause/resume/update
* `DELETE /schedules/{id}` → delete (soft delete, optional)

### 5.2 Run control & introspection

* `POST /run` — ad‑hoc run

  ```json
  { "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
  ```
* `GET /runs` — list with paging
* `GET /runs/{id}` — status, stats, links to deltas
* `POST /runs/{id}/cancel` — best‑effort cancel

### 5.3 Previews (dry‑run)

* `POST /preview/impact` — returns **candidate count** and a small sample of impacted digests for given change keys or selection.

### 5.4 Event webhooks (optional push from Feedser/Vexer)

* `POST /events/feedser-export`

  ```json
  { "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
  ```
* `POST /events/vexer-export`

  ```json
  { "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
  ```

**Security**: webhook requires **mTLS** or an **HMAC** `X-Scheduler-Signature` (Ed25519 / SHA‑256) plus Authority token.

---

## 6) Planner → Runner pipeline

### 6.1 Planning algorithm (event‑driven)

```
On Export Event (Feedser/Vexer):
  keys = Normalize(change payload)          # productKeys or vulnIds→productKeys
  usageOnly = schedule/policy hint?         # default true
  sel = Selector for tenant/scope from schedules subscribed to events

  impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
  impacted = ApplyOwnerFilters(impacted, sel)           # namespaces/repos/labels
  impacted = DeduplicateByDigest(impacted)
  impacted = EnforceLimits(impacted, limits.maxJobs)
  shards    = Shard(impacted, byHashPrefix, n=limits.parallelism)

  For each shard:
    Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)
```

**Fairness & pacing**

* Use **leaky bucket** per tenant and per registry host.
* Prioritize **KEV‑tagged** and **critical** first if oversubscribed.

### 6.2 Nightly planning

```
At cron tick:
  sel = resolve selection
  candidates = ImpactIndex.ResolveAll(sel)
  if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
  shard & enqueue as above
```

### 6.3 Execution (Runner)

* Pop **RunSegment** job → for each image digest:

  * **analysis‑only**: `POST scanner/reports { imageDigest, policyRevision? }`
  * **content‑refresh**: resolve tag→digest if needed; `POST scanner/scans { imageRef, attest? false }` then `POST /reports`
* Collect **delta**: `newFindings`, `newCriticals`/`highs`, `links` (UI deep link, Rekor if present).
* Persist per‑image outcome in `runs.{id}.stats` (incremental counters).
* Emit `scheduler.rescan.delta` events to **Notify** only when **delta > 0** and matches severity rule.

---

## 7) Event model (outbound)

**Topic**: `rescan.delta` (internal bus → Notify; UI subscribes via backend).

```json
{
  "tenant": "tenant-01",
  "runId": "324af…",
  "imageDigest": "sha256:…",
  "newCriticals": 1,
  "newHigh": 2,
  "kevHits": ["CVE-2025-..."],
  "topFindings": [
    { "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
  ],
  "reportUrl": "https://ui/.../scans/sha256:.../report",
  "attestation": { "uuid":"rekor-uuid", "verified": true },
  "ts": "2025-10-18T03:12:45Z"
}
```

**Also**: `report.ready` for “no‑change” summaries (digest + zero delta), which Notify can ignore by rule.

---

## 8) Security posture

* **AuthN/Z**: Authority OpToks with `aud=scheduler`; DPoP (preferred) or mTLS.
* **Multi‑tenant**: every schedule, run, and event carries `tenantId`; ImpactIndex filters by tenant‑visible images.
* **Webhook** callers (Feedser/Vexer) present **mTLS** or **HMAC** and Authority token.
* **Input hardening**: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
* **No secrets** in logs; redact tokens and signatures.

---

## 9) Observability & SLOs

**Metrics (Prometheus)**

* `scheduler.events_total{source, result}`
* `scheduler.impact_resolve_seconds{quantile}`
* `scheduler.images_selected_total{mode}`
* `scheduler.jobs_enqueued_total{mode}`
* `scheduler.run_latency_seconds{quantile}` // event → first verdict
* `scheduler.delta_images_total{severity}`
* `scheduler.rate_limited_total{reason}`

**Targets**

* Resolve 10k changed keys → impacted set in **<300 ms** (hot cache).
* Event → first rescan verdict in **≤60 s** (p95).
* Nightly coverage 50k images in **≤10 min** with 10 workers (analysis‑only).

**Tracing** (OTEL): spans `plan`, `resolve`, `enqueue`, `report_call`, `persist`, `emit`.

---

## 10) Configuration (YAML)

```yaml
scheduler:
  authority:
    issuer: "https://authority.internal"
    require: "dpop"            # or "mtls"
  queue:
    kind: "redis"              # or "nats"
    url: "redis://redis:6379/4"
  mongo:
    uri: "mongodb://mongo/scheduler"
  impactIndex:
    storage: "rocksdb"         # "rocksdb" | "redis" | "memory"
    warmOnStart: true
    usageOnlyDefault: true
  limits:
    defaultRatePerSecond: 50
    defaultParallelism: 8
    maxJobsPerRun: 50000
  integrates:
    scannerUrl: "https://scanner-web.internal"
    feedserWebhook: true
    vexerWebhook: true
  notifications:
    emitBus: "internal"        # deliver to Notify via internal bus
```

---

## 11) UI touch‑points

* **Schedules** page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
* **Runs** page: timeline; heat‑map of deltas; drill‑down to affected images.
* **Dry‑run preview** modal: “This Feedser export touches ~3,214 images; projected deltas: ~420 (34 KEV).”

---

## 12) Failure modes & degradations

| Condition                            | Behavior                                                                                 |
| ------------------------------------ | ---------------------------------------------------------------------------------------- |
| ImpactIndex cold / incomplete        | Fall back to **All** selection for nightly; for events, cap to KEV+critical until warmed |
| Feedser/Vexer webhook storm          | Coalesce by exportId; debounce 30–60 s; keep last                                        |
| Scanner under load (429)             | Backoff with jitter; respect per‑tenant/leaky bucket                                     |
| Oversubscription (too many impacted) | Prioritize KEV/critical first; spillover to next window; UI banner shows backlog         |
| Notify down                          | Buffer outbound events in queue (TTL 24h)                                                |
| Mongo slow                           | Cut batch sizes; sample‑log; alert ops; don’t drop runs unless critical                  |

---

## 13) Testing matrix

* **ImpactIndex**: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
* **Planner**: dedupe, shard, fairness, limit enforcement, KEV prioritization.
* **Runner**: parallel report calls, error backoff, partial failures, idempotency.
* **End‑to‑end**: Feedser export → deltas visible in UI in ≤60 s.
* **Security**: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
* **Chaos**: drop scanner availability; simulate registry throttles (content‑refresh mode).
* **Nightly**: cron tick correctness across timezones and DST.

---

## 14) Implementation notes

* **Language**: .NET 10 minimal API; Channels‑based pipeline; `System.Threading.RateLimiting`.
* **Bitmaps**: Roaring via `RoaringBitmap` bindings; memory‑map large shards if RocksDB used.
* **Cron**: Quartz‑style parser with timezone support; clock skew tolerated ±60 s.
* **Dry‑run**: use ImpactIndex only; never call scanner.
* **Idempotency**: run segments carry deterministic keys; retries safe.
* **Backpressure**: per‑tenant buckets; per‑host registry budgets respected when content‑refresh enabled.

---

## 15) Sequences (representative)

**A) Event‑driven rescan (Feedser delta)**

```mermaid
sequenceDiagram
  autonumber
  participant FE as Feedser
  participant SCH as Scheduler.Worker
  participant IDX as ImpactIndex
  participant SC as Scanner.WebService
  participant NO as Notify

  FE->>SCH: POST /events/feedser-export {exportId, changedProductKeys}
  SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
  IDX-->>SCH: bitmap(imageIds) → digests list
  SCH->>SC: POST /reports {imageDigest}  (batch/sequenced)
  SC-->>SCH: report deltas (new criticals/highs)
  alt delta>0
    SCH->>NO: rescan.delta {digest, newCriticals, links}
  end
```

**B) Nightly rescan**

```mermaid
sequenceDiagram
  autonumber
  participant CRON as Cron
  participant SCH as Scheduler.Worker
  participant IDX as ImpactIndex
  participant SC as Scanner.WebService

  CRON->>SCH: tick (02:00 Europe/Sofia)
  SCH->>IDX: ResolveAll(selector)
  IDX-->>SCH: candidates
  SCH->>SC: POST /reports {digest} (paced)
  SC-->>SCH: results
  SCH-->>SCH: aggregate, store run stats
```

**C) Content‑refresh (tag followers)**

```mermaid
sequenceDiagram
  autonumber
  participant SCH as Scheduler
  participant SC as Scanner
  SCH->>SC: resolve tag→digest (if changed)
  alt digest changed
    SCH->>SC: POST /scans {imageRef}    # new SBOM
    SC-->>SCH: scan complete (artifacts)
    SCH->>SC: POST /reports {imageDigest}
  else unchanged
    SCH->>SC: POST /reports {imageDigest}  # analysis-only
  end
```

---

## 16) Roadmap

* **Vuln‑centric impact**: pre‑join vuln→purl→images to rank by **KEV** and **exploited‑in‑the‑wild** signals.
* **Policy diff preview**: when a staged policy changes, show projected breakage set before promotion.
* **Cross‑cluster federation**: one Scheduler instance driving many Scanner clusters (tenant isolation).
* **Windows containers**: integrate Zastava runtime hints for Usage view tightening.

---

**End — component_architecture_scheduler.md**