Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added detailed task completion records for KMS interface implementation and CLI support for file-based keys. - Documented security enhancements including Argon2id password hashing, audit event contracts, and rate limiting configurations. - Included scoped service support and integration updates for the Plugin platform, ensuring proper DI handling and testing coverage.
520 lines
26 KiB
Markdown
Executable File
520 lines
26 KiB
Markdown
Executable File
# High‑Level Architecture — **Stella Ops** (Consolidated • 2025Q4)
|
||
|
||
> **Want the 10-minute tour?** See [`high-level-architecture.md`](high-level-architecture.md); this file retains the exhaustive reference.
|
||
|
||
> **Purpose.** A complete, implementation‑ready map of Stella Ops: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic.
|
||
> **Scope.** This file **replaces** the separate `components.md`; all component details now live here.
|
||
|
||
---
|
||
|
||
## 0) Product vision & principles
|
||
|
||
**Vision.** Stella Ops is a **deterministic SBOM + VEX platform** for CI/CD and runtime, tuned for **speed** (per‑layer deltas), **quiet output** (usage‑scoped views), and **verifiability** (DSSE + Rekor v2). It is **self‑hostable**, **air‑gap capable**, and **commercially enforceable**: only licensed installations can produce **Stella Ops‑verified** attestations.
|
||
|
||
**Operating principles.**
|
||
|
||
* **Scanner‑owned SBOMs.** We generate our own BOMs; we do not warehouse third‑party SBOM content (we can **link** to attested SBOMs).
|
||
* **Deterministic evidence.** Facts come from package DBs, installed metadata, linkers, and verified attestations; no fuzzy guessing in the core.
|
||
* **Per‑layer caching.** Cache fragments by **layer digest** and compose image SBOMs via **CycloneDX BOM‑Link** / **SPDX ExternalRef**.
|
||
* **Inventory vs Usage.** Always record the full **inventory** of what exists; separately present **usage** (entrypoint closure + loaded libs).
|
||
* **Backend decides.** PASS/FAIL is produced by **Policy** + **VEX** + **Advisories**. The scanner reports facts.
|
||
* **Attest or it didn’t happen.** Every export is signed as **in‑toto/DSSE** and logged in **Rekor v2**.
|
||
* **Sovereign‑ready.** Cloud is used only for licensing and optional endorsement; everything else is first‑party and self‑hostable.
|
||
|
||
---
|
||
|
||
## 1) Service topology & trust boundaries
|
||
|
||
### 1.1 Runtime inventory (first‑party)
|
||
|
||
| Service / Tool | Container image | Core role | Scale pattern |
|
||
| ------------------------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
|
||
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports; **analysis‑only report runs** for Scheduler. | Stateless; N replicas behind LB. |
|
||
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/Mach‑O, EntryTrace); emits per‑layer SBOMs and composes image SBOMs. | Horizontal; queue‑driven; sharded by layer digest. |
|
||
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for build‑time SBOMs as OCI **referrers**. | CI‑side; ephemeral. |
|
||
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLI‑orchestrated scanner container for post‑build scans. | Local/CI; ephemeral. |
|
||
| **Concelier.WebService** | `stellaops/concelier-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
|
||
| **Excititor.WebService** | `stellaops/excititor-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
|
||
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usage‑gating); produces **policy digest**. | In‑process; cache per digest. |
|
||
| **Scheduler.WebService** | `stellaops/scheduler-web` | Schedules **re‑evaluation** runs; consumes Concelier/Excititor deltas; selects **impacted images** via BOM‑Index; orchestrates analysis‑only reports. | Stateless API. |
|
||
| **Scheduler.Worker** | `stellaops/scheduler-worker` | Executes selection and enqueues batches toward Scanner; enforces rate/limits and windows; maintains impact cursors. | Horizontal; queue‑driven. |
|
||
| **Notify.WebService** | `stellaops/notify-web` | Rules engine for outbound notifications; manages channels, templates, throttle/digest logic. | Stateless API. |
|
||
| **Notify.Worker** | `stellaops/notify-worker` | Delivers to Slack/Teams/Email/Webhooks; idempotent retries; digests. | Horizontal; per‑channel rate limits. |
|
||
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
|
||
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
|
||
| **Authority** | `stellaops/authority` | On‑prem OIDC issuing **short‑lived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
|
||
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
|
||
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, **Scheduler**, **Notify**, runtime, reports. | Stateless. |
|
||
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper; **schedule** and **notify** verbs. | Local/CI. |
|
||
|
||
### 1.2 Third‑party (self‑hosted)
|
||
|
||
* **Fulcio** (Sigstore CA) — issues short‑lived signing certs (keyless).
|
||
* **Rekor v2** (tile‑backed transparency log).
|
||
* **RustFS** — offline-first object store with deterministic REST API (S3/MinIO fallback available for legacy installs).
|
||
* **MongoDB** — catalog, advisories, VEX, scheduler, notify.
|
||
* **Queue** — Redis Streams / NATS / RabbitMQ (pluggable).
|
||
* **OCI Registry** — must support **Referrers API** (discover SBOMs/signatures).
|
||
|
||
### 1.3 Cloud licensing (Stella Ops)
|
||
|
||
* **Licensing Service** (`www.stella-ops.org`) — issues long‑lived **License Tokens (LT)**; exchanges LT → **Proof‑of‑Entitlement (PoE)** bound to an installation key; revoke/introspect PoE; optional cross‑log **endorsement**.
|
||
|
||
### 1.4 Diagram (control/data planes & trust)
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph Cloud["www.stella-ops.org (Cloud)"]
|
||
LS[Licensing Service<br/>LT→PoE / revoke / introspect]
|
||
end
|
||
|
||
subgraph OnPrem["Customer Site (Self-hosted)"]
|
||
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
|
||
SW[Scanner.WebService]
|
||
WK[Scanner.Worker xN]
|
||
CONC[Concelier]
|
||
EXC[Excititor]
|
||
SCHW[Scheduler.Web]
|
||
SCH[Scheduler.Worker xN]
|
||
NOTW[Notify.Web]
|
||
NOT[Notify.Worker xN]
|
||
POL[Policy Engine (in Scanner.Web)]
|
||
SGN[Signer\n(entitlement + signing)]
|
||
ATT[Attestor\n(Rekor v2 submit/verify)]
|
||
UI[Web UI (Angular)]
|
||
Z[Zastava\n(Runtime Inspector/Enforcer)]
|
||
RFS[(RustFS object store)]
|
||
MGO[(MongoDB)]
|
||
QUE[(Queue/Streams)]
|
||
end
|
||
|
||
CLI[StellaOps.Cli / Buildx Plugin]
|
||
REG[(OCI Registry with Referrers)]
|
||
FUL[ Fulcio ]
|
||
REK[ Rekor v2 (tiles) ]
|
||
|
||
CLI -->|scan/build| SW
|
||
SW -->|jobs| QUE
|
||
QUE --> WK
|
||
WK --> RFS
|
||
SW --> MGO
|
||
CONC --> MGO
|
||
EXC --> MGO
|
||
UI --> SW
|
||
Z --> SW
|
||
|
||
%% New event-driven loop
|
||
CONC -- export.delta --> SCHW
|
||
EXC -- export.delta --> SCHW
|
||
SCHW --> SCH
|
||
SCH --> SW
|
||
SW -- report.ready --> NOTW
|
||
Z -- admission/observe --> NOTW
|
||
|
||
SGN <--> Auth
|
||
SGN --> FUL
|
||
SGN -->|mTLS| ATT
|
||
ATT --> REK
|
||
|
||
SGN <-->|verify referrers| REG
|
||
```
|
||
|
||
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI/Scheduler/Notify never sign.
|
||
|
||
---
|
||
|
||
## 2) Licensing & tokens (installation‑ready, theft‑resistant)
|
||
|
||
**Two‑token model.**
|
||
|
||
* **License Token (LT)** — long‑lived JWT from **Licensing Service**; used **once** to enroll the installation; never used in hot path.
|
||
* **Proof‑of‑Entitlement (PoE)** — bound to the installation key (mTLS client cert **or** DPoP‑bound JWT with `cnf`); medium‑lived; renewable; revocable.
|
||
* **Operational token (OpTok)** — 2–5 min OIDC token from **Authority**, **sender‑constrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**/**Scheduler.Web**/**Notify.Web**.
|
||
|
||
**Signer enforces both:** PoE proves entitlement; OpTok proves “who is calling now”. It also **independently verifies** the **scanner image digest** is **Stella Ops‑signed** via **Referrers + cosign** before signing anything.
|
||
|
||
**Enrollment sequence (LT → PoE).**
|
||
|
||
```plantuml
|
||
@startuml
|
||
actor Operator
|
||
participant "Install Agent" as IA
|
||
participant "Licensing Service" as LS
|
||
Operator -> IA: Provide LT
|
||
IA -> IA: Generate K_inst
|
||
IA -> LS: /license/enroll {LT, pub(K_inst)}
|
||
LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
|
||
@enduml
|
||
```
|
||
|
||
---
|
||
|
||
## 3) Scanner subsystem (facts engine)
|
||
|
||
### 3.1 Analyzers (deterministic only)
|
||
|
||
* **OS packages:** apk/dpkg/rpm (Linux); Windows MSI/SxS/GAC (M2).
|
||
* **Language (installed state):**
|
||
|
||
* Java (pom.properties / MANIFEST) → `pkg:maven/...`
|
||
* Node (`node_modules/*/package.json`) → `pkg:npm/...`
|
||
* Python (`*.dist-info/METADATA`) → `pkg:pypi/...`
|
||
* Go (buildinfo) → `pkg:golang/...`
|
||
* .NET (`*.deps.json`) → `pkg:nuget/...`
|
||
* **Rust:** deterministic **language markers** (symbol mangling) and crates only when present; otherwise `bin:{sha256}`.
|
||
* **Native:** ELF/PE/Mach‑O imports, DT_NEEDED, RPATH/RUNPATH, symbol versions, PE version info.
|
||
* **EntryTrace:** parse `ENTRYPOINT`/`CMD`; shell AST; resolve launchers (Java/Node/Python) to terminal program; record file:line chain.
|
||
|
||
### 3.2 Caching & composition
|
||
|
||
* **Layer cache:** `{layerDigest → SBOM fragment + analyzer meta}`.
|
||
* **File CAS:** `{sha256(file) → parse result (ELF/JAR metadata/etc.)}`.
|
||
* **Composition:** build **image SBOMs** from fragments via **BOM‑Link/ExternalRef**; emit **two views**:
|
||
|
||
* **Inventory** (complete filesystem inventory).
|
||
* **Usage** (entrypoint closure + linked libs).
|
||
* **Transport:** JSON **and** **CycloneDX Protobuf** (compact, fast to parse).
|
||
* **Index:** BOM‑Index sidecar with purl table + roaring bitmap + `usedByEntrypoint` flag for fast joins.
|
||
|
||
### 3.3 Diff (image → layer → package)
|
||
|
||
* Added / Removed / Version‑changed changes, **attributed** to the layer that caused them.
|
||
* Raw diffs preserved; backend view applies **VEX + Policy**.
|
||
|
||
### 3.4 Build‑time SBOMs (fast CI path)
|
||
|
||
* Buildx **generator** runs analyzers during `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, attaches SBOMs as **OCI referrers**.
|
||
* Scanner.WebService can trust these (policy‑configurable) and **skip** re‑scan; DSSE + Rekor v2 can be done either at build time or post‑push via Signer/Attestor.
|
||
|
||
### 3.5 Events / integrations
|
||
|
||
* **Out:** `report.ready` (summary + verdict + Rekor UUID) → internal bus for **Notify** & UI.
|
||
* **Expose:** image‑level **BOM‑Index** metadata for **Scheduler** impact selection.
|
||
|
||
---
|
||
|
||
## 4) Backend evaluation (decider)
|
||
|
||
### 4.1 Concelier (advisories)
|
||
|
||
* Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in Mongo; exports **deterministic JSON** and **Trivy DB**.
|
||
* Offline kit bundles for air‑gapped sites.
|
||
|
||
### 4.2 Excititor (VEX)
|
||
|
||
* Ingests **OpenVEX / CSAF VEX / CycloneDX VEX**; normalizes claims; retains conflicts; computes **consensus** with provider trust weights and justification gates.
|
||
|
||
### 4.3 Policy Engine (YAML DSL)
|
||
|
||
* Matchers: `image/repo/env/purl/cve/vendor/source/path/layerDigest/usedByEntrypoint`
|
||
* Actions: `ignore(until, justification)`, `fail`, `warn`, `defer`, `requireVEX{vendors, justifications}`, `escalate {sev, KEV, EPSS}`, license constraints.
|
||
* Produces a **policy digest** (SHA‑256 of canonicalized policy).
|
||
|
||
### 4.4 PASS/FAIL flow
|
||
|
||
1. SBOM (Inventory / Usage) → join with **Concelier** advisories.
|
||
2. Apply **Excititor** consensus (statuses & justifications).
|
||
3. Apply **Policy**; compute PASS/FAIL with waiver TTLs.
|
||
4. Sign the **final report** (DSSE via **Signer**) and log to **Rekor v2** via **Attestor**.
|
||
|
||
---
|
||
|
||
## 5) Runtime enforcement (Zastava)
|
||
|
||
* **Observer:** inventories running containers, checks image signatures, SBOM presence (referrers), detects drift (entrypoint chain divergence), flags unapproved images.
|
||
* **Admission Webhook (optional):** blocks policy‑fail pods (dry‑run first).
|
||
* **Integration:** posts runtime events to Scanner.WebService; can request **delta scans** on changed layers.
|
||
|
||
---
|
||
|
||
## 6) Storage & catalogs (RustFS/Mongo)
|
||
|
||
**RustFS layout (default)**
|
||
|
||
```
|
||
rustfs://stellaops/
|
||
layers/<sha256>/sbom.cdx.json.zst
|
||
layers/<sha256>/sbom.spdx.json.zst
|
||
images/<imgDigest>/inventory.cdx.pb
|
||
images/<imgDigest>/usage.cdx.pb
|
||
indexes/<imgDigest>/bom-index.bin
|
||
attest/<artifactSha256>.dsse.json
|
||
```
|
||
|
||
**Catalog (Mongo)**
|
||
|
||
* `artifacts` (type/format/sha/size/rekor/ttl/immutable/refCount/createdAt)
|
||
* `images`, `layers`, `links`, `lifecycleRules`
|
||
* **Scheduler:** `schedules`, `runs`, `locks`, `impact_cursors`
|
||
* **Notify:** `rules`, `deliveries`, `channels`, `templates`
|
||
|
||
**Retention**
|
||
|
||
* RustFS applies retention via `X-RustFS-Retain-Seconds`; Scanner.WebService GC decrements `refCount` and deletes unreferenced metadata; S3/MinIO fallback retains native Object Lock when enabled.
|
||
|
||
---
|
||
|
||
## 7) APIs (consolidated surface)
|
||
|
||
### 7.1 Scanner.WebService
|
||
|
||
```
|
||
POST /api/scans { imageRef|digest, force? } → { scanId }
|
||
GET /api/scans/{id} → { status, digests, artifacts[] }
|
||
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
|
||
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
|
||
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
|
||
POST /api/reports { imageDigest, policyRevision?, vexSnapshot? } → { reportId, verdict, rekorUrl }
|
||
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
|
||
GET /healthz | /readyz | /metrics
|
||
```
|
||
|
||
### 7.2 Signer (mTLS; hard gate)
|
||
|
||
```
|
||
POST /sign/dsse # body: {subjectHash, imageDigest, predicate}; headers: OpTok (DPoP/mTLS) + PoE
|
||
GET /verify/referrers?imageDigest=sha256:... # is this image StellaOps-signed?
|
||
```
|
||
|
||
### 7.3 Attestor (mTLS)
|
||
|
||
```
|
||
POST /rekor/entries # DSSE bundle → {uuid, index, proof, logURL}
|
||
GET /rekor/entries/{uuid}
|
||
```
|
||
|
||
### 7.4 Authority (OIDC)
|
||
|
||
* `/.well-known/openid-configuration`, `/oauth/token` (DPoP/mTLS), `/oauth/introspect`, `/jwks`
|
||
|
||
### 7.5 Licensing (cloud)
|
||
|
||
```
|
||
POST /license/enroll { LT, pubKey } → PoE + introspection endpoints
|
||
POST /license/revoke { license_id } → ok
|
||
POST /license/introspect { poe } → { active, claims, exp }
|
||
POST /attest/endorse { bundle } → endorsement bundle (optional)
|
||
```
|
||
|
||
### 7.6 Scheduler
|
||
|
||
```
|
||
POST /api/v1/scheduler/schedules {yaml|json} → { scheduleId }
|
||
GET /api/v1/scheduler/schedules → [ { id, nextRun, status, stats } ]
|
||
POST /api/v1/scheduler/run { id|selector } → { runId }
|
||
GET /api/v1/scheduler/runs/{id} → { status, counts, links }
|
||
GET /api/v1/scheduler/cursor → { lastConcelierExportId, lastExcititorExportId }
|
||
```
|
||
|
||
### 7.7 Notify
|
||
|
||
```
|
||
POST /api/v1/notify/test { channel, target } → { delivered }
|
||
POST /api/v1/notify/rules {yaml|json} → { ruleId }
|
||
GET /api/v1/notify/rules → [ { id, match, actions, enabled } ]
|
||
GET /api/v1/notify/deliveries → [ { id, eventId, channel, status, attempts } ]
|
||
```
|
||
|
||
---
|
||
|
||
## 8) Security & verifiability
|
||
|
||
* **Sender‑constrained tokens.** All operational calls use **DPoP** (RFC 9449) or **mTLS‑bound** tokens (RFC 8705).
|
||
* **Entitlement.** **PoE** is mandatory; revocation honored online.
|
||
* **Release integrity.** **Signer** independently verifies **scanner image digest** via **Referrers + cosign** before signing.
|
||
* **Separation of duties.** Scanner/UI/Scheduler/Notify cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
|
||
* **Verifiers.** Anyone can verify: DSSE signature → certificate chain to **Stella Ops Fulcio/KMS root** → **Rekor v2** inclusion.
|
||
* **RBAC.** Roles: `scanner.admin|read`, `scheduler.admin|read`, `notify.admin|read`, `zastava.admin|read`.
|
||
* **Community vs Authorized.** Free/community runs throttled with no official attestations; authorized runs full speed and produce **Stella Ops‑verified** bundles.
|
||
|
||
**DSSE predicate (SBOM/report)**
|
||
|
||
```json
|
||
{
|
||
"predicateType": "https://stella-ops.org/attestations/sbom/1",
|
||
"subject": [{ "name": "s3://stellaops/images/<digest>/inventory.cdx.pb", "digest": { "sha256": "<sha256>" } }],
|
||
"predicate": {
|
||
"image_digest": "<sha256:...>",
|
||
"stellaops_version": "2.3.1 (2027.04)",
|
||
"license_id": "LIC-9F2A...",
|
||
"customer_id": "CUST-ACME",
|
||
"plan": "pro",
|
||
"policy_digest": "sha256:...",
|
||
"views": ["inventory","usage"],
|
||
"created": "2025-10-17T12:34:56Z"
|
||
}
|
||
}
|
||
```
|
||
|
||
**BOM‑Index sidecar**
|
||
Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags for fast policy joins.
|
||
|
||
---
|
||
|
||
## 9) Scale, performance & quotas
|
||
|
||
* **Workers:** horizontal; **distributed lock per layer digest**; global CAS in MinIO.
|
||
* **Queues:** Redis Streams / NATS / RabbitMQ. HPA by queue depth, CPU, memory.
|
||
* **Registry throttling:** per‑registry concurrency budgets.
|
||
* **Targets:**
|
||
|
||
* Build‑time path P95 ≤ 3–5 s on warmed bases.
|
||
* Post‑build delta scan P95 ≤ 10 s for 200 MB images.
|
||
* Policy + VEX evaluation ≤ 500 ms for 5k components using BOM‑Index.
|
||
* **Event → notification** p95 ≤ **30–60 s** under nominal load.
|
||
* **Export delta → re‑evaluation verdict** p95 ≤ **5 min** for 10k impacted images.
|
||
* **Quotas:** license plan enforces QPS/concurrency/size; **Signer** throttles and can deny DSSE.
|
||
|
||
---
|
||
|
||
## 10) DevOps & distribution
|
||
|
||
* **Releases:** all first‑party images **cosign‑signed**; labels embed `org.stellaops.version` and `org.stellaops.release_date`.
|
||
* **Channels:**
|
||
|
||
* **Community** (public registry): throttled, non‑attesting.
|
||
* **Authorized** (private registry): full speed, DSSE enabled.
|
||
* **Client update flow:** containers self‑verify signatures at boot; report version; **Signer** enforces `valid_release_year` / `max_version` from PoE before signing.
|
||
* **Compose skeleton:**
|
||
|
||
```yaml
|
||
services:
|
||
authority: { image: stellaops/authority }
|
||
fulcio: { image: sigstore/fulcio }
|
||
rekor: { image: sigstore/rekor-v2 }
|
||
minio: { image: minio/minio, command: server /data --console-address ":9001" }
|
||
mongo: { image: mongo:7 }
|
||
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
|
||
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
|
||
scanner-web: { image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
|
||
scanner-worker: { image: stellaops/scanner-worker, deploy: { replicas: 4 }, depends_on: [scanner-web] }
|
||
concelier: { image: stellaops/concelier-web, depends_on: [mongo] }
|
||
excititor: { image: stellaops/excititor-web, depends_on: [mongo] }
|
||
scheduler-web: { image: stellaops/scheduler-web, depends_on: [mongo] }
|
||
scheduler-worker:{ image: stellaops/scheduler-worker, deploy: { replicas: 2 }, depends_on: [scheduler-web] }
|
||
notify-web: { image: stellaops/notify-web, depends_on: [mongo] }
|
||
notify-worker: { image: stellaops/notify-worker, deploy: { replicas: 2 }, depends_on: [notify-web] }
|
||
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor, scheduler-web, notify-web] }
|
||
```
|
||
|
||
* **Backups:** Mongo dumps; RustFS snapshots (or S3 versioning when fallback driver is used); Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation.
|
||
* **Ops runbooks:** Scheduler catch‑up after Concelier/Excititor recovery; connector key rotation (Slack/Teams/SMTP).
|
||
* **SLOs & alerts:** lag between Concelier/Excititor export and first rescan verdict; delivery failure rates by channel.
|
||
|
||
---
|
||
|
||
## 11) Observability & audit
|
||
|
||
* **Metrics:** scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
|
||
* **Scheduler metrics:** `scheduler.impacted_images_total`, `scheduler.jobs_enqueued_total`, `scheduler.selection_ms`, end‑to‑end p95 (event → verdict).
|
||
* **Notify metrics:** `notify.sent_total{channel}`, `notify.dropped_total{reason}`, `notify.digest_coalesced_total`, `notify.latency_ms`.
|
||
* **Tracing:** per‑stage spans; correlation IDs across Scanner→Signer→Attestor and Concelier/Excititor→Scheduler→Scanner→Notify.
|
||
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID; Scheduler records who scheduled what; Notify records where, when, and why messages were sent or deduped.
|
||
* **Compliance:** RustFS retention headers (or MinIO Object Lock when operating in S3 mode) keep immutable artifacts tamper‑resistant; reproducible outputs via policy digest + SBOM digest in predicate.
|
||
|
||
---
|
||
|
||
## 12) Roadmap (anchored to this architecture)
|
||
|
||
* M2: Windows MSI/SxS/GAC analyzers; deeper Rust (DWARF enrichers).
|
||
* M2: Buildx generator certified flows; cross‑registry trust policies.
|
||
* M3: Patch‑Presence plugin (signature‑based backport detection), opt‑in.
|
||
* M3: Zastava Admission control GA with policy presets and dry‑run→enforce stages.
|
||
* M3: **Scheduler GA** with export‑delta impact routing and capacity‑aware pacing.
|
||
* M3: **Notify GA** with digests, Slack/Teams/Email/Webhooks; **M4:** PagerDuty/Opsgenie connectors.
|
||
* Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.
|
||
|
||
---
|
||
|
||
## 13) Canonical sequences (verification, re‑evaluation & notify)
|
||
|
||
**Sign & log (OpTok + PoE, image verify, DSSE, Rekor).**
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
autonumber
|
||
participant Scan as Scanner.WebService
|
||
participant Auth as Authority (OIDC)
|
||
participant Sign as Signer
|
||
participant Reg as OCI Registry
|
||
participant Ful as Fulcio/KMS
|
||
participant Att as Attestor
|
||
participant Rek as Rekor v2
|
||
|
||
Scan->>Auth: Get OpTok (DPoP/mTLS)
|
||
Scan->>Sign: sign(request) + OpTok + PoE + DPoP proof
|
||
Sign->>Auth: Validate OpTok & sender-constraint
|
||
Sign->>Sign: Validate PoE (introspect/revocation)
|
||
Sign->>Reg: Verify scanner image is StellaOps-signed (Referrers + cosign)
|
||
alt OK
|
||
Sign->>Ful: Get signing cert (keyless) or use KMS key
|
||
Sign-->>Scan: DSSE bundle (cert chain)
|
||
Scan->>Att: Submit bundle
|
||
Att-->>Rek: Create entry
|
||
Rek-->>Att: {uuid,index,proof}
|
||
Att-->>Scan: Rekor URL
|
||
else Deny
|
||
Sign-->>Scan: 403 (no attestation)
|
||
end
|
||
```
|
||
|
||
**Event‑driven re‑evaluation & notify.**
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CONC as Concelier
|
||
participant EXC as Excititor
|
||
participant SCH as Scheduler
|
||
participant SC as Scanner.WebService
|
||
participant NO as Notify
|
||
|
||
CONC->>SCH: export.delta {changedProductKeys, exportId}
|
||
EXC ->>SCH: export.delta {changedProductKeys, exportId}
|
||
SCH->>SCH: Impact select via BOM-Index bitmaps
|
||
SCH->>SC: Enqueue analysis-only reports (batches)
|
||
SC-->>SCH: verdict stream (PASS/FAIL, deltas)
|
||
SCH->>NO: rescan.delta {imageDigest, newCriticals, links}
|
||
NO-->>Slack/Teams/Email/Webhook: deliver (throttle/digest rules applied)
|
||
```
|
||
|
||
---
|
||
|
||
## 14) Minimal data shapes (Scheduler & Notify)
|
||
|
||
**Scheduler schedule (YAML via UI/CLI)**
|
||
|
||
```yaml
|
||
name: nightly-eu
|
||
when: "0 2 * * * Europe/Sofia"
|
||
mode: analysis-only # or content-refresh
|
||
selection:
|
||
scope: all-images # or tenant/ns/repo label selectors
|
||
onlyIf: { lastReportOlderThanDays: 7 }
|
||
notify:
|
||
onNewFindings: true
|
||
minSeverity: high
|
||
limits:
|
||
maxJobs: 5000
|
||
ratePerSecond: 50
|
||
```
|
||
|
||
**Notify rule (YAML)**
|
||
|
||
```yaml
|
||
name: high-critical-alerts
|
||
match:
|
||
eventKinds: ["report.ready","rescan.delta","zastava.admission"]
|
||
minSeverity: high
|
||
namespaces: ["prod-*"]
|
||
vex: { includeAcceptedJustifications: false }
|
||
actions:
|
||
- channel: slack
|
||
target: "#sec-alerts"
|
||
template: "concise"
|
||
throttle: "5m"
|
||
- channel: email
|
||
target: "soc@acme.org"
|
||
digest: "hourly"
|
||
enabled: true
|
||
```
|