Files
git.stella-ops.org/docs/ARCHITECTURE_SCHEDULER.md

16 KiB
Raw Blame History

component_architecture_scheduler.md — StellaOps Scheduler (2025Q4)

Scope. Implementationready architecture for Scheduler: a service that (1) reevaluates alreadycataloged images when intel changes (Feedser/Vexer/policy), (2) orchestrates nightly and adhoc runs, (3) targets only the impacted images using the BOMIndex, and (4) emits reportready events that downstream Notify fans out. Default mode is analysisonly (no image pull); optional contentrefresh can be enabled per schedule.


0) Mission & boundaries

Mission. Keep scan results current without rescanning the world. When new advisories or VEX claims land, pinpoint affected images and ask the backend to recompute verdicts against the existing SBOMs. Surface only meaningful deltas to humans and ticket queues.

Boundaries.

  • Scheduler does not compute SBOMs and does not sign. It calls Scanner/WebServices /reports (analysisonly) endpoint and lets the backend (Policy + Vexer + Feedser) decide PASS/FAIL.
  • Scheduler may ask Scanner to contentrefresh selected targets (e.g., mutable tags) but the default is no image pull.
  • Notifications are not sent directly; Scheduler emits events consumed by Notify.

1) Runtime shape & projects

src/
 ├─ StellaOps.Scheduler.WebService/      # REST (schedules CRUD, runs, admin)
 ├─ StellaOps.Scheduler.Worker/          # planners + runners (N replicas)
 ├─ StellaOps.Scheduler.ImpactIndex/     # purl→images inverted index (roaring bitmaps)
 ├─ StellaOps.Scheduler.Models/          # DTOs (Schedule, Run, ImpactSet, Deltas)
 ├─ StellaOps.Scheduler.Storage.Mongo/   # schedules, runs, cursors, locks
 ├─ StellaOps.Scheduler.Queue/           # Redis Streams / NATS abstraction
 ├─ StellaOps.Scheduler.Tests.*          # unit/integration/e2e

Deployables:

  • Scheduler.WebService (stateless)
  • Scheduler.Worker (scaleout; planners + executors)

Dependencies: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Feedser, Vexer, MongoDB, Redis/NATS, (optional) Notify.


2) Core responsibilities

  1. Timebased runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
  2. Eventdriven runs: react to Feedser export and Vexer export deltas (changed product keys / advisories / claims).
  3. Impact targeting: map changes to image sets using a global inverted index built from Scanners perimage BOMIndex sidecars.
  4. Run planning: shard, pace, and ratelimit jobs to avoid thundering herds.
  5. Execution: call Scanner /reports (analysisonly) or /scans (contentrefresh); aggregate delta results.
  6. Events: publish rescan.delta and report.ready summaries for Notify & UI.
  7. Control plane: CRUD schedules, pause/resume, dryrun previews, audit.

3) Data model (Mongo)

Database: scheduler

  • schedules

    { _id, tenantId, name, enabled, whenCron, timezone,
      mode: "analysis-only" | "content-refresh",
      selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
                   includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
      onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
      notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
      limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
      createdAt, updatedAt, createdBy, updatedBy }
    
  • runs

    { _id, scheduleId?, tenantId, trigger: "cron|feedser|vexer|manual",
      reason?: { feedserExportId?, vexerExportId?, cursor? },
      state: "planning|queued|running|completed|error|cancelled",
      stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
      startedAt, finishedAt, error? }
    
  • impact_cursors

    { _id: tenantId, feedserLastExportId, vexerLastExportId, updatedAt }
    
  • locks (singleton schedulers, run leases)

  • audit (CRUD actions, run outcomes)

Indexes:

  • schedules on {tenantId, enabled}, {whenCron}.
  • runs on {tenantId, startedAt desc}, {state}.
  • TTL optional for completed runs (e.g., 180 days).

4) ImpactIndex (global inverted index)

Goal: translate change keysimage sets in milliseconds.

Source: Scanner produces perimage BOMIndex sidecars (purls, and usedByEntrypoint bitmaps). Scheduler ingests/refreshes them to build a global index.

Representation:

  • Assign image IDs (dense ints) to catalog images.

  • Keep Roaring Bitmaps:

    • Contains[purl] → bitmap(imageIds)
    • UsedBy[purl] → bitmap(imageIds) (subset of Contains)
  • Optionally keep Owner maps: {imageId → {tenantId, namespaces[], repos[]}} for selection filters.

  • Persist in RocksDB/LMDB or Redismodules; cache hot shards in memory; snapshot to Mongo for cold start.

Update paths:

  • On new/updated image SBOM: merge perimage set into global maps.
  • On image remove/expiry: clear id from bitmaps.

API (internal):

IImpactIndex {
  ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
  ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Feedser)
  ImpactSet ResolveAll(Selector sel); // for nightly
}

Selector filters: tenant, namespaces, repos, labels, digest allowlists, includeTags patterns.


5) External interfaces (REST)

Base path: /api/v1/scheduler (Authority OpToks; scopes: scheduler.read, scheduler.admin).

5.1 Schedules CRUD

  • POST /schedules → create
  • GET /schedules → list (filter by tenant)
  • GET /schedules/{id} → details + next run
  • PATCH /schedules/{id} → pause/resume/update
  • DELETE /schedules/{id} → delete (soft delete, optional)

5.2 Run control & introspection

  • POST /run — adhoc run

    { "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
    
  • GET /runs — list with paging

  • GET /runs/{id} — status, stats, links to deltas

  • POST /runs/{id}/cancel — besteffort cancel

5.3 Previews (dryrun)

  • POST /preview/impact — returns candidate count and a small sample of impacted digests for given change keys or selection.

5.4 Event webhooks (optional push from Feedser/Vexer)

  • POST /events/feedser-export

    { "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
    
  • POST /events/vexer-export

    { "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
    

Security: webhook requires mTLS or an HMAC X-Scheduler-Signature (Ed25519 / SHA256) plus Authority token.


6) Planner → Runner pipeline

6.1 Planning algorithm (eventdriven)

On Export Event (Feedser/Vexer):
  keys = Normalize(change payload)          # productKeys or vulnIds→productKeys
  usageOnly = schedule/policy hint?         # default true
  sel = Selector for tenant/scope from schedules subscribed to events

  impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
  impacted = ApplyOwnerFilters(impacted, sel)           # namespaces/repos/labels
  impacted = DeduplicateByDigest(impacted)
  impacted = EnforceLimits(impacted, limits.maxJobs)
  shards    = Shard(impacted, byHashPrefix, n=limits.parallelism)

  For each shard:
    Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)

Fairness & pacing

  • Use leaky bucket per tenant and per registry host.
  • Prioritize KEVtagged and critical first if oversubscribed.

6.2 Nightly planning

At cron tick:
  sel = resolve selection
  candidates = ImpactIndex.ResolveAll(sel)
  if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
  shard & enqueue as above

6.3 Execution (Runner)

  • Pop RunSegment job → for each image digest:

    • analysisonly: POST scanner/reports { imageDigest, policyRevision? }
    • contentrefresh: resolve tag→digest if needed; POST scanner/scans { imageRef, attest? false } then POST /reports
  • Collect delta: newFindings, newCriticals/highs, links (UI deep link, Rekor if present).

  • Persist perimage outcome in runs.{id}.stats (incremental counters).

  • Emit scheduler.rescan.delta events to Notify only when delta > 0 and matches severity rule.


7) Event model (outbound)

Topic: rescan.delta (internal bus → Notify; UI subscribes via backend).

{
  "tenant": "tenant-01",
  "runId": "324af…",
  "imageDigest": "sha256:…",
  "newCriticals": 1,
  "newHigh": 2,
  "kevHits": ["CVE-2025-..."],
  "topFindings": [
    { "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
  ],
  "reportUrl": "https://ui/.../scans/sha256:.../report",
  "attestation": { "uuid":"rekor-uuid", "verified": true },
  "ts": "2025-10-18T03:12:45Z"
}

Also: report.ready for “nochange” summaries (digest + zero delta), which Notify can ignore by rule.


8) Security posture

  • AuthN/Z: Authority OpToks with aud=scheduler; DPoP (preferred) or mTLS.
  • Multitenant: every schedule, run, and event carries tenantId; ImpactIndex filters by tenantvisible images.
  • Webhook callers (Feedser/Vexer) present mTLS or HMAC and Authority token.
  • Input hardening: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
  • No secrets in logs; redact tokens and signatures.

9) Observability & SLOs

Metrics (Prometheus)

  • scheduler.events_total{source, result}
  • scheduler.impact_resolve_seconds{quantile}
  • scheduler.images_selected_total{mode}
  • scheduler.jobs_enqueued_total{mode}
  • scheduler.run_latency_seconds{quantile} // event → first verdict
  • scheduler.delta_images_total{severity}
  • scheduler.rate_limited_total{reason}

Targets

  • Resolve 10k changed keys → impacted set in <300ms (hot cache).
  • Event → first rescan verdict in ≤60s (p95).
  • Nightly coverage 50k images in ≤10min with 10 workers (analysisonly).

Tracing (OTEL): spans plan, resolve, enqueue, report_call, persist, emit.


10) Configuration (YAML)

scheduler:
  authority:
    issuer: "https://authority.internal"
    require: "dpop"            # or "mtls"
  queue:
    kind: "redis"              # or "nats"
    url: "redis://redis:6379/4"
  mongo:
    uri: "mongodb://mongo/scheduler"
  impactIndex:
    storage: "rocksdb"         # "rocksdb" | "redis" | "memory"
    warmOnStart: true
    usageOnlyDefault: true
  limits:
    defaultRatePerSecond: 50
    defaultParallelism: 8
    maxJobsPerRun: 50000
  integrates:
    scannerUrl: "https://scanner-web.internal"
    feedserWebhook: true
    vexerWebhook: true
  notifications:
    emitBus: "internal"        # deliver to Notify via internal bus

11) UI touchpoints

  • Schedules page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
  • Runs page: timeline; heatmap of deltas; drilldown to affected images.
  • Dryrun preview modal: “This Feedser export touches ~3,214 images; projected deltas: ~420 (34 KEV).”

12) Failure modes & degradations

Condition Behavior
ImpactIndex cold / incomplete Fall back to All selection for nightly; for events, cap to KEV+critical until warmed
Feedser/Vexer webhook storm Coalesce by exportId; debounce 3060s; keep last
Scanner under load (429) Backoff with jitter; respect pertenant/leaky bucket
Oversubscription (too many impacted) Prioritize KEV/critical first; spillover to next window; UI banner shows backlog
Notify down Buffer outbound events in queue (TTL 24h)
Mongo slow Cut batch sizes; samplelog; alert ops; dont drop runs unless critical

13) Testing matrix

  • ImpactIndex: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
  • Planner: dedupe, shard, fairness, limit enforcement, KEV prioritization.
  • Runner: parallel report calls, error backoff, partial failures, idempotency.
  • Endtoend: Feedser export → deltas visible in UI in ≤60s.
  • Security: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
  • Chaos: drop scanner availability; simulate registry throttles (contentrefresh mode).
  • Nightly: cron tick correctness across timezones and DST.

14) Implementation notes

  • Language: .NET 10 minimal API; Channelsbased pipeline; System.Threading.RateLimiting.
  • Bitmaps: Roaring via RoaringBitmap bindings; memorymap large shards if RocksDB used.
  • Cron: Quartzstyle parser with timezone support; clock skew tolerated ±60s.
  • Dryrun: use ImpactIndex only; never call scanner.
  • Idempotency: run segments carry deterministic keys; retries safe.
  • Backpressure: pertenant buckets; perhost registry budgets respected when contentrefresh enabled.

15) Sequences (representative)

A) Eventdriven rescan (Feedser delta)

sequenceDiagram
  autonumber
  participant FE as Feedser
  participant SCH as Scheduler.Worker
  participant IDX as ImpactIndex
  participant SC as Scanner.WebService
  participant NO as Notify

  FE->>SCH: POST /events/feedser-export {exportId, changedProductKeys}
  SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
  IDX-->>SCH: bitmap(imageIds) → digests list
  SCH->>SC: POST /reports {imageDigest}  (batch/sequenced)
  SC-->>SCH: report deltas (new criticals/highs)
  alt delta>0
    SCH->>NO: rescan.delta {digest, newCriticals, links}
  end

B) Nightly rescan

sequenceDiagram
  autonumber
  participant CRON as Cron
  participant SCH as Scheduler.Worker
  participant IDX as ImpactIndex
  participant SC as Scanner.WebService

  CRON->>SCH: tick (02:00 Europe/Sofia)
  SCH->>IDX: ResolveAll(selector)
  IDX-->>SCH: candidates
  SCH->>SC: POST /reports {digest} (paced)
  SC-->>SCH: results
  SCH-->>SCH: aggregate, store run stats

C) Contentrefresh (tag followers)

sequenceDiagram
  autonumber
  participant SCH as Scheduler
  participant SC as Scanner
  SCH->>SC: resolve tag→digest (if changed)
  alt digest changed
    SCH->>SC: POST /scans {imageRef}    # new SBOM
    SC-->>SCH: scan complete (artifacts)
    SCH->>SC: POST /reports {imageDigest}
  else unchanged
    SCH->>SC: POST /reports {imageDigest}  # analysis-only
  end

16) Roadmap

  • Vulncentric impact: prejoin vuln→purl→images to rank by KEV and exploitedinthewild signals.
  • Policy diff preview: when a staged policy changes, show projected breakage set before promotion.
  • Crosscluster federation: one Scheduler instance driving many Scanner clusters (tenant isolation).
  • Windows containers: integrate Zastava runtime hints for Usage view tightening.

End — component_architecture_scheduler.md