Files
git.stella-ops.org/docs/ARCHITECTURE_SCHEDULER.md

425 lines
16 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# component_architecture_scheduler.md — **StellaOps Scheduler** (2025Q4)
> **Scope.** Implementationready architecture for **Scheduler**: a service that (1) **reevaluates** alreadycataloged images when intel changes (Feedser/Vexer/policy), (2) orchestrates **nightly** and **adhoc** runs, (3) targets only the **impacted** images using the BOMIndex, and (4) emits **reportready** events that downstream **Notify** fans out. Default mode is **analysisonly** (no image pull); optional **contentrefresh** can be enabled per schedule.
---
## 0) Mission & boundaries
**Mission.** Keep scan results **current** without rescanning the world. When new advisories or VEX claims land, **pinpoint** affected images and ask the backend to recompute **verdicts** against the **existing SBOMs**. Surface only **meaningful deltas** to humans and ticket queues.
**Boundaries.**
* Scheduler **does not** compute SBOMs and **does not** sign. It calls Scanner/WebServices **/reports (analysisonly)** endpoint and lets the backend (Policy + Vexer + Feedser) decide PASS/FAIL.
* Scheduler **may** ask Scanner to **contentrefresh** selected targets (e.g., mutable tags) but the default is **no** image pull.
* Notifications are **not** sent directly; Scheduler emits events consumed by **Notify**.
---
## 1) Runtime shape & projects
```
src/
├─ StellaOps.Scheduler.WebService/ # REST (schedules CRUD, runs, admin)
├─ StellaOps.Scheduler.Worker/ # planners + runners (N replicas)
├─ StellaOps.Scheduler.ImpactIndex/ # purl→images inverted index (roaring bitmaps)
├─ StellaOps.Scheduler.Models/ # DTOs (Schedule, Run, ImpactSet, Deltas)
├─ StellaOps.Scheduler.Storage.Mongo/ # schedules, runs, cursors, locks
├─ StellaOps.Scheduler.Queue/ # Redis Streams / NATS abstraction
├─ StellaOps.Scheduler.Tests.* # unit/integration/e2e
```
**Deployables**:
* **Scheduler.WebService** (stateless)
* **Scheduler.Worker** (scaleout; planners + executors)
**Dependencies**: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Feedser, Vexer, MongoDB, Redis/NATS, (optional) Notify.
---
## 2) Core responsibilities
1. **Timebased** runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
2. **Eventdriven** runs: react to **Feedser export** and **Vexer export** deltas (changed product keys / advisories / claims).
3. **Impact targeting**: map changes to **image sets** using a **global inverted index** built from Scanners perimage **BOMIndex** sidecars.
4. **Run planning**: shard, pace, and ratelimit jobs to avoid thundering herds.
5. **Execution**: call Scanner **/reports (analysisonly)** or **/scans (contentrefresh)**; aggregate **delta** results.
6. **Events**: publish `rescan.delta` and `report.ready` summaries for **Notify** & **UI**.
7. **Control plane**: CRUD schedules, **pause/resume**, dryrun previews, audit.
---
## 3) Data model (Mongo)
**Database**: `scheduler`
* `schedules`
```
{ _id, tenantId, name, enabled, whenCron, timezone,
mode: "analysis-only" | "content-refresh",
selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
createdAt, updatedAt, createdBy, updatedBy }
```
* `runs`
```
{ _id, scheduleId?, tenantId, trigger: "cron|feedser|vexer|manual",
reason?: { feedserExportId?, vexerExportId?, cursor? },
state: "planning|queued|running|completed|error|cancelled",
stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
startedAt, finishedAt, error? }
```
* `impact_cursors`
```
{ _id: tenantId, feedserLastExportId, vexerLastExportId, updatedAt }
```
* `locks` (singleton schedulers, run leases)
* `audit` (CRUD actions, run outcomes)
**Indexes**:
* `schedules` on `{tenantId, enabled}`, `{whenCron}`.
* `runs` on `{tenantId, startedAt desc}`, `{state}`.
* TTL optional for completed runs (e.g., 180 days).
---
## 4) ImpactIndex (global inverted index)
Goal: translate **change keys** → **image sets** in **milliseconds**.
**Source**: Scanner produces perimage **BOMIndex** sidecars (purls, and `usedByEntrypoint` bitmaps). Scheduler ingests/refreshes them to build a **global** index.
**Representation**:
* Assign **image IDs** (dense ints) to catalog images.
* Keep **Roaring Bitmaps**:
* `Contains[purl] → bitmap(imageIds)`
* `UsedBy[purl] → bitmap(imageIds)` (subset of Contains)
* Optionally keep **Owner maps**: `{imageId → {tenantId, namespaces[], repos[]}}` for selection filters.
* Persist in RocksDB/LMDB or Redismodules; cache hot shards in memory; snapshot to Mongo for cold start.
**Update paths**:
* On new/updated image SBOM: **merge** perimage set into global maps.
* On image remove/expiry: **clear** id from bitmaps.
**API (internal)**:
```csharp
IImpactIndex {
ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Feedser)
ImpactSet ResolveAll(Selector sel); // for nightly
}
```
**Selector filters**: tenant, namespaces, repos, labels, digest allowlists, `includeTags` patterns.
---
## 5) External interfaces (REST)
Base path: `/api/v1/scheduler` (Authority OpToks; scopes: `scheduler.read`, `scheduler.admin`).
### 5.1 Schedules CRUD
* `POST /schedules` → create
* `GET /schedules` → list (filter by tenant)
* `GET /schedules/{id}` → details + next run
* `PATCH /schedules/{id}` → pause/resume/update
* `DELETE /schedules/{id}` → delete (soft delete, optional)
### 5.2 Run control & introspection
* `POST /run` — adhoc run
```json
{ "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
```
* `GET /runs` — list with paging
* `GET /runs/{id}` — status, stats, links to deltas
* `POST /runs/{id}/cancel` — besteffort cancel
### 5.3 Previews (dryrun)
* `POST /preview/impact` — returns **candidate count** and a small sample of impacted digests for given change keys or selection.
### 5.4 Event webhooks (optional push from Feedser/Vexer)
* `POST /events/feedser-export`
```json
{ "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
```
* `POST /events/vexer-export`
```json
{ "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
```
**Security**: webhook requires **mTLS** or an **HMAC** `X-Scheduler-Signature` (Ed25519 / SHA256) plus Authority token.
---
## 6) Planner → Runner pipeline
### 6.1 Planning algorithm (eventdriven)
```
On Export Event (Feedser/Vexer):
keys = Normalize(change payload) # productKeys or vulnIds→productKeys
usageOnly = schedule/policy hint? # default true
sel = Selector for tenant/scope from schedules subscribed to events
impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
impacted = ApplyOwnerFilters(impacted, sel) # namespaces/repos/labels
impacted = DeduplicateByDigest(impacted)
impacted = EnforceLimits(impacted, limits.maxJobs)
shards = Shard(impacted, byHashPrefix, n=limits.parallelism)
For each shard:
Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)
```
**Fairness & pacing**
* Use **leaky bucket** per tenant and per registry host.
* Prioritize **KEVtagged** and **critical** first if oversubscribed.
### 6.2 Nightly planning
```
At cron tick:
sel = resolve selection
candidates = ImpactIndex.ResolveAll(sel)
if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
shard & enqueue as above
```
### 6.3 Execution (Runner)
* Pop **RunSegment** job → for each image digest:
* **analysisonly**: `POST scanner/reports { imageDigest, policyRevision? }`
* **contentrefresh**: resolve tag→digest if needed; `POST scanner/scans { imageRef, attest? false }` then `POST /reports`
* Collect **delta**: `newFindings`, `newCriticals`/`highs`, `links` (UI deep link, Rekor if present).
* Persist perimage outcome in `runs.{id}.stats` (incremental counters).
* Emit `scheduler.rescan.delta` events to **Notify** only when **delta > 0** and matches severity rule.
---
## 7) Event model (outbound)
**Topic**: `rescan.delta` (internal bus → Notify; UI subscribes via backend).
```json
{
"tenant": "tenant-01",
"runId": "324af…",
"imageDigest": "sha256:…",
"newCriticals": 1,
"newHigh": 2,
"kevHits": ["CVE-2025-..."],
"topFindings": [
{ "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
],
"reportUrl": "https://ui/.../scans/sha256:.../report",
"attestation": { "uuid":"rekor-uuid", "verified": true },
"ts": "2025-10-18T03:12:45Z"
}
```
**Also**: `report.ready` for “nochange” summaries (digest + zero delta), which Notify can ignore by rule.
---
## 8) Security posture
* **AuthN/Z**: Authority OpToks with `aud=scheduler`; DPoP (preferred) or mTLS.
* **Multitenant**: every schedule, run, and event carries `tenantId`; ImpactIndex filters by tenantvisible images.
* **Webhook** callers (Feedser/Vexer) present **mTLS** or **HMAC** and Authority token.
* **Input hardening**: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
* **No secrets** in logs; redact tokens and signatures.
---
## 9) Observability & SLOs
**Metrics (Prometheus)**
* `scheduler.events_total{source, result}`
* `scheduler.impact_resolve_seconds{quantile}`
* `scheduler.images_selected_total{mode}`
* `scheduler.jobs_enqueued_total{mode}`
* `scheduler.run_latency_seconds{quantile}` // event → first verdict
* `scheduler.delta_images_total{severity}`
* `scheduler.rate_limited_total{reason}`
**Targets**
* Resolve 10k changed keys → impacted set in **<300ms** (hot cache).
* Event → first rescan verdict in **≤60s** (p95).
* Nightly coverage 50k images in **≤10min** with 10 workers (analysisonly).
**Tracing** (OTEL): spans `plan`, `resolve`, `enqueue`, `report_call`, `persist`, `emit`.
---
## 10) Configuration (YAML)
```yaml
scheduler:
authority:
issuer: "https://authority.internal"
require: "dpop" # or "mtls"
queue:
kind: "redis" # or "nats"
url: "redis://redis:6379/4"
mongo:
uri: "mongodb://mongo/scheduler"
impactIndex:
storage: "rocksdb" # "rocksdb" | "redis" | "memory"
warmOnStart: true
usageOnlyDefault: true
limits:
defaultRatePerSecond: 50
defaultParallelism: 8
maxJobsPerRun: 50000
integrates:
scannerUrl: "https://scanner-web.internal"
feedserWebhook: true
vexerWebhook: true
notifications:
emitBus: "internal" # deliver to Notify via internal bus
```
---
## 11) UI touchpoints
* **Schedules** page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
* **Runs** page: timeline; heatmap of deltas; drilldown to affected images.
* **Dryrun preview** modal: “This Feedser export touches ~3,214 images; projected deltas: ~420 (34 KEV).”
---
## 12) Failure modes & degradations
| Condition | Behavior |
| ------------------------------------ | ---------------------------------------------------------------------------------------- |
| ImpactIndex cold / incomplete | Fall back to **All** selection for nightly; for events, cap to KEV+critical until warmed |
| Feedser/Vexer webhook storm | Coalesce by exportId; debounce 3060s; keep last |
| Scanner under load (429) | Backoff with jitter; respect pertenant/leaky bucket |
| Oversubscription (too many impacted) | Prioritize KEV/critical first; spillover to next window; UI banner shows backlog |
| Notify down | Buffer outbound events in queue (TTL 24h) |
| Mongo slow | Cut batch sizes; samplelog; alert ops; dont drop runs unless critical |
---
## 13) Testing matrix
* **ImpactIndex**: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
* **Planner**: dedupe, shard, fairness, limit enforcement, KEV prioritization.
* **Runner**: parallel report calls, error backoff, partial failures, idempotency.
* **Endtoend**: Feedser export → deltas visible in UI in ≤60s.
* **Security**: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
* **Chaos**: drop scanner availability; simulate registry throttles (contentrefresh mode).
* **Nightly**: cron tick correctness across timezones and DST.
---
## 14) Implementation notes
* **Language**: .NET 10 minimal API; Channelsbased pipeline; `System.Threading.RateLimiting`.
* **Bitmaps**: Roaring via `RoaringBitmap` bindings; memorymap large shards if RocksDB used.
* **Cron**: Quartzstyle parser with timezone support; clock skew tolerated ±60s.
* **Dryrun**: use ImpactIndex only; never call scanner.
* **Idempotency**: run segments carry deterministic keys; retries safe.
* **Backpressure**: pertenant buckets; perhost registry budgets respected when contentrefresh enabled.
---
## 15) Sequences (representative)
**A) Eventdriven rescan (Feedser delta)**
```mermaid
sequenceDiagram
autonumber
participant FE as Feedser
participant SCH as Scheduler.Worker
participant IDX as ImpactIndex
participant SC as Scanner.WebService
participant NO as Notify
FE->>SCH: POST /events/feedser-export {exportId, changedProductKeys}
SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
IDX-->>SCH: bitmap(imageIds) → digests list
SCH->>SC: POST /reports {imageDigest} (batch/sequenced)
SC-->>SCH: report deltas (new criticals/highs)
alt delta>0
SCH->>NO: rescan.delta {digest, newCriticals, links}
end
```
**B) Nightly rescan**
```mermaid
sequenceDiagram
autonumber
participant CRON as Cron
participant SCH as Scheduler.Worker
participant IDX as ImpactIndex
participant SC as Scanner.WebService
CRON->>SCH: tick (02:00 Europe/Sofia)
SCH->>IDX: ResolveAll(selector)
IDX-->>SCH: candidates
SCH->>SC: POST /reports {digest} (paced)
SC-->>SCH: results
SCH-->>SCH: aggregate, store run stats
```
**C) Contentrefresh (tag followers)**
```mermaid
sequenceDiagram
autonumber
participant SCH as Scheduler
participant SC as Scanner
SCH->>SC: resolve tag→digest (if changed)
alt digest changed
SCH->>SC: POST /scans {imageRef} # new SBOM
SC-->>SCH: scan complete (artifacts)
SCH->>SC: POST /reports {imageDigest}
else unchanged
SCH->>SC: POST /reports {imageDigest} # analysis-only
end
```
---
## 16) Roadmap
* **Vulncentric impact**: prejoin vuln→purl→images to rank by **KEV** and **exploitedinthewild** signals.
* **Policy diff preview**: when a staged policy changes, show projected breakage set before promotion.
* **Crosscluster federation**: one Scheduler instance driving many Scanner clusters (tenant isolation).
* **Windows containers**: integrate Zastava runtime hints for Usage view tightening.
---
**End — component_architecture_scheduler.md**