- Implemented comprehensive unit tests for VexCandidateEmitter to validate candidate emission logic based on various scenarios including absent and present APIs, confidence thresholds, and rate limiting. - Added integration tests for SmartDiff PostgreSQL repositories, covering snapshot storage and retrieval, candidate storage, and material risk change handling. - Ensured tests validate correct behavior for storing, retrieving, and querying snapshots and candidates, including edge cases and expected outcomes.
577 lines
32 KiB
Markdown
Executable File
577 lines
32 KiB
Markdown
Executable File
# High‑Level Architecture — **Stella Ops** (Consolidated • 2025Q4)
|
||
|
||
> **Want the 10-minute tour?** See [`high-level-architecture.md`](high-level-architecture.md); this file retains the exhaustive reference.
|
||
|
||
> **Purpose.** A complete, implementation‑ready map of Stella Ops: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic.
|
||
> **Scope.** This file **replaces** the separate `components.md`; all component details now live here.
|
||
|
||
---
|
||
|
||
## 0) Product vision & principles
|
||
|
||
**Vision.** Stella Ops is a **deterministic SBOM + VEX platform** for CI/CD and runtime, tuned for **speed** (per‑layer deltas), **quiet output** (usage‑scoped views), and **verifiability** (DSSE + Rekor v2). It is **self‑hostable**, **air‑gap capable**, and **commercially enforceable**: only licensed installations can produce **Stella Ops‑verified** attestations.
|
||
|
||
**Operating principles.**
|
||
|
||
* **Scanner‑owned SBOMs.** We generate our own BOMs; we do not warehouse third‑party SBOM content (we can **link** to attested SBOMs).
|
||
* **Deterministic evidence.** Facts come from package DBs, installed metadata, linkers, and verified attestations; no fuzzy guessing in the core.
|
||
* **Per-layer caching.** Cache fragments by **layer digest** and compose image SBOMs via **CycloneDX BOM-Link** / **SPDX ExternalRef**.
|
||
* **Inventory vs Usage.** Always record the full **inventory** of what exists; separately present **usage** (entrypoint closure + loaded libs).
|
||
* **Backend decides.** PASS/FAIL is produced by **Policy** + **VEX** + **Advisories**. The scanner reports facts.
|
||
* **VEX-first triage UX.** Operators triage by artifact with evidence-first cards, VEX decisioning, and immutable audit bundles; see `docs/product-advisories/archived/27-Nov-2025-superseded/28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md`.
|
||
* **Attest or it didn't happen.** Every export is signed as **in-toto/DSSE** and logged in **Rekor v2**.
|
||
* **Hybrid reachability attestations.** Every reachability graph ships with a graph-level DSSE (mandatory) plus optional edge-bundle DSSEs for runtime/init/contested edges; Policy/Signals consume graph DSSE as baseline and edge bundles for quarantine/disputes. See `docs/reachability/hybrid-attestation.md` for verification runbooks, Rekor guidance, and offline replay steps.
|
||
* **Sovereign-ready.** Cloud is used only for licensing and optional endorsement; everything else is first-party and self-hostable.
|
||
* **Competitive clarity.** Moats: deterministic replay, hybrid reachability proofs, lattice VEX, sovereign crypto, proof graph; see `docs/market/competitive-landscape.md`.
|
||
|
||
---
|
||
|
||
## 1) Service topology & trust boundaries
|
||
|
||
### 1.1 Runtime inventory (first‑party)
|
||
|
||
| Service / Tool | Container image | Core role | Scale pattern |
|
||
| ------------------------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
|
||
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports; **analysis‑only report runs** for Scheduler. | Stateless; N replicas behind LB. |
|
||
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/Mach‑O, EntryTrace); emits per‑layer SBOMs and composes image SBOMs. | Horizontal; queue‑driven; sharded by layer digest. |
|
||
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for build‑time SBOMs as OCI **referrers**. | CI‑side; ephemeral. |
|
||
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLI‑orchestrated scanner container for post‑build scans. | Local/CI; ephemeral. |
|
||
| **Concelier.WebService** | `stellaops/concelier-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via PostgreSQL locks. |
|
||
| **Excititor.WebService** | `stellaops/excititor-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via PostgreSQL locks. |
|
||
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usage‑gating); produces **policy digest**. | In‑process; cache per digest. |
|
||
| **Scheduler.WebService** | `stellaops/scheduler-web` | Schedules **re‑evaluation** runs; consumes Concelier/Excititor deltas; selects **impacted images** via BOM‑Index; orchestrates analysis‑only reports. | Stateless API. |
|
||
| **Scheduler.Worker** | `stellaops/scheduler-worker` | Executes selection and enqueues batches toward Scanner; enforces rate/limits and windows; maintains impact cursors. | Horizontal; queue‑driven. |
|
||
| **Notify.WebService** | `stellaops/notify-web` | Rules engine for outbound notifications; manages channels, templates, throttle/digest logic. | Stateless API. |
|
||
| **Notify.Worker** | `stellaops/notify-worker` | Delivers to Slack/Teams/Email/Webhooks; idempotent retries; digests. | Horizontal; per‑channel rate limits. |
|
||
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
|
||
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
|
||
| **Authority** | `stellaops/authority` | On‑prem OIDC issuing **short‑lived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
|
||
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
|
||
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, vulnerability triage (artifact-first), audit bundles, **Scheduler**, **Notify**, runtime, reports. | Stateless. |
|
||
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper; **schedule** and **notify** verbs. | Local/CI. |
|
||
|
||
### 1.2 Third‑party (self‑hosted)
|
||
|
||
* **Fulcio** (Sigstore CA) — issues short‑lived signing certs (keyless).
|
||
* **Rekor v2** (tile‑backed transparency log).
|
||
* **RustFS** — offline-first object store with deterministic REST API (S3/MinIO fallback available for legacy installs).
|
||
* **PostgreSQL** (≥16) — primary control-plane storage with per-module schema isolation (authority, vuln, vex, scheduler, notify, policy, concelier). See [Database Architecture](#database-architecture-postgresql).
|
||
* **Queue** — Redis Streams / NATS / RabbitMQ (pluggable).
|
||
* **OCI Registry** — must support **Referrers API** (discover SBOMs/signatures).
|
||
|
||
### 1.3 Cloud licensing (Stella Ops)
|
||
|
||
* **Licensing Service** (`www.stella-ops.org`) — issues long‑lived **License Tokens (LT)**; exchanges LT → **Proof‑of‑Entitlement (PoE)** bound to an installation key; revoke/introspect PoE; optional cross‑log **endorsement**.
|
||
|
||
### 1.4 Diagram (control/data planes & trust)
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph Cloud["www.stella-ops.org (Cloud)"]
|
||
LS[Licensing Service<br/>LT→PoE / revoke / introspect]
|
||
end
|
||
|
||
subgraph OnPrem["Customer Site (Self-hosted)"]
|
||
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
|
||
SW[Scanner.WebService]
|
||
WK[Scanner.Worker xN]
|
||
CONC[Concelier]
|
||
EXC[Excititor]
|
||
SCHW[Scheduler.Web]
|
||
SCH[Scheduler.Worker xN]
|
||
NOTW[Notify.Web]
|
||
NOT[Notify.Worker xN]
|
||
POL[Policy Engine (in Scanner.Web)]
|
||
SGN[Signer\n(entitlement + signing)]
|
||
ATT[Attestor\n(Rekor v2 submit/verify)]
|
||
UI[Web UI (Angular)]
|
||
Z[Zastava\n(Runtime Inspector/Enforcer)]
|
||
RFS[(RustFS object store)]
|
||
PG[(PostgreSQL)]
|
||
QUE[(Queue/Streams)]
|
||
end
|
||
|
||
CLI[StellaOps.Cli / Buildx Plugin]
|
||
REG[(OCI Registry with Referrers)]
|
||
FUL[ Fulcio ]
|
||
REK[ Rekor v2 (tiles) ]
|
||
|
||
CLI -->|scan/build| SW
|
||
SW -->|jobs| QUE
|
||
QUE --> WK
|
||
WK --> RFS
|
||
SW --> PG
|
||
CONC --> PG
|
||
EXC --> PG
|
||
UI --> SW
|
||
Z --> SW
|
||
|
||
%% New event-driven loop
|
||
CONC -- export.delta --> SCHW
|
||
EXC -- export.delta --> SCHW
|
||
SCHW --> SCH
|
||
SCH --> SW
|
||
SW -- report.ready --> NOTW
|
||
Z -- admission/observe --> NOTW
|
||
|
||
SGN <--> Auth
|
||
SGN --> FUL
|
||
SGN -->|mTLS| ATT
|
||
ATT --> REK
|
||
|
||
SGN <-->|verify referrers| REG
|
||
```
|
||
|
||
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI/Scheduler/Notify never sign.
|
||
|
||
---
|
||
|
||
## 2) Licensing & tokens (installation‑ready, theft‑resistant)
|
||
|
||
**Two‑token model.**
|
||
|
||
* **License Token (LT)** — long‑lived JWT from **Licensing Service**; used **once** to enroll the installation; never used in hot path.
|
||
* **Proof‑of‑Entitlement (PoE)** — bound to the installation key (mTLS client cert **or** DPoP‑bound JWT with `cnf`); medium‑lived; renewable; revocable.
|
||
* **Operational token (OpTok)** — 2–5 min OIDC token from **Authority**, **sender‑constrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**/**Scheduler.Web**/**Notify.Web**.
|
||
|
||
**Signer enforces both:** PoE proves entitlement; OpTok proves “who is calling now”. It also **independently verifies** the **scanner image digest** is **Stella Ops‑signed** via **Referrers + cosign** before signing anything.
|
||
|
||
**Enrollment sequence (LT → PoE).**
|
||
|
||
```plantuml
|
||
@startuml
|
||
actor Operator
|
||
participant "Install Agent" as IA
|
||
participant "Licensing Service" as LS
|
||
Operator -> IA: Provide LT
|
||
IA -> IA: Generate K_inst
|
||
IA -> LS: /license/enroll {LT, pub(K_inst)}
|
||
LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
|
||
@enduml
|
||
```
|
||
|
||
---
|
||
|
||
## 3) Scanner subsystem (facts engine)
|
||
|
||
### 3.1 Analyzers (deterministic only)
|
||
|
||
* **OS packages:** apk/dpkg/rpm (Linux); Windows MSI/SxS/GAC (M2).
|
||
* **Language (installed state):**
|
||
|
||
* Java (pom.properties / MANIFEST) → `pkg:maven/...`
|
||
* Node (`node_modules/*/package.json`) → `pkg:npm/...`
|
||
* Python (`*.dist-info/METADATA`) → `pkg:pypi/...`
|
||
* Go (buildinfo) → `pkg:golang/...`
|
||
* .NET (`*.deps.json`) → `pkg:nuget/...`
|
||
* **Rust:** deterministic **language markers** (symbol mangling) and crates only when present; otherwise `bin:{sha256}`.
|
||
* **Native:** ELF/PE/Mach‑O imports, DT_NEEDED, RPATH/RUNPATH, symbol versions, PE version info.
|
||
* **EntryTrace:** parse `ENTRYPOINT`/`CMD`; shell AST; resolve launchers (Java/Node/Python) to terminal program; record file:line chain.
|
||
|
||
### 3.2 Caching & composition
|
||
|
||
* **Layer cache:** `{layerDigest → SBOM fragment + analyzer meta}`.
|
||
* **File CAS:** `{sha256(file) → parse result (ELF/JAR metadata/etc.)}`.
|
||
* **Composition:** build **image SBOMs** from fragments via **BOM‑Link/ExternalRef**; emit **two views**:
|
||
|
||
* **Inventory** (complete filesystem inventory).
|
||
* **Usage** (entrypoint closure + linked libs).
|
||
* **Transport:** JSON **and** **CycloneDX Protobuf** (compact, fast to parse).
|
||
* **Index:** BOM‑Index sidecar with purl table + roaring bitmap + `usedByEntrypoint` flag for fast joins.
|
||
|
||
### 3.3 Diff (image → layer → package)
|
||
|
||
* Added / Removed / Version‑changed changes, **attributed** to the layer that caused them.
|
||
* Raw diffs preserved; backend view applies **VEX + Policy**.
|
||
|
||
### 3.4 Build‑time SBOMs (fast CI path)
|
||
|
||
* Buildx **generator** runs analyzers during `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, attaches SBOMs as **OCI referrers**.
|
||
* Scanner.WebService can trust these (policy‑configurable) and **skip** re‑scan; DSSE + Rekor v2 can be done either at build time or post‑push via Signer/Attestor.
|
||
|
||
### 3.5 Events / integrations
|
||
|
||
* **Out:** `report.ready` (summary + verdict + Rekor UUID) → internal bus for **Notify** & UI.
|
||
* **Expose:** image‑level **BOM‑Index** metadata for **Scheduler** impact selection.
|
||
|
||
---
|
||
|
||
## 4) Backend evaluation (decider)
|
||
|
||
### 4.1 Concelier (advisories)
|
||
|
||
* Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in PostgreSQL; exports **deterministic JSON** and **Trivy DB**.
|
||
* Offline kit bundles for air‑gapped sites.
|
||
|
||
### 4.2 Excititor (VEX)
|
||
|
||
* Ingests **OpenVEX / CSAF VEX / CycloneDX VEX**; normalizes claims; retains conflicts; computes **consensus** with provider trust weights and justification gates.
|
||
|
||
### 4.3 Policy Engine (YAML DSL)
|
||
|
||
* Matchers: `image/repo/env/purl/cve/vendor/source/path/layerDigest/usedByEntrypoint`
|
||
* Actions: `ignore(until, justification)`, `fail`, `warn`, `defer`, `requireVEX{vendors, justifications}`, `escalate {sev, KEV, EPSS}`, license constraints.
|
||
* Produces a **policy digest** (SHA‑256 of canonicalized policy).
|
||
|
||
### 4.4 PASS/FAIL flow
|
||
|
||
1. SBOM (Inventory / Usage) → join with **Concelier** advisories.
|
||
2. Apply **Excititor** consensus (statuses & justifications).
|
||
3. Apply **Policy**; compute PASS/FAIL with waiver TTLs.
|
||
4. Sign the **final report** (DSSE via **Signer**) and log to **Rekor v2** via **Attestor**.
|
||
|
||
---
|
||
|
||
## 5) Runtime enforcement (Zastava)
|
||
|
||
* **Observer:** inventories running containers, checks image signatures, SBOM presence (referrers), detects drift (entrypoint chain divergence), flags unapproved images.
|
||
* **Admission Webhook (optional):** blocks policy‑fail pods (dry‑run first).
|
||
* **Integration:** posts runtime events to Scanner.WebService; can request **delta scans** on changed layers.
|
||
|
||
---
|
||
|
||
## 6) Storage & catalogs (RustFS/PostgreSQL)
|
||
|
||
**RustFS layout (default)**
|
||
|
||
```
|
||
rustfs://stellaops/
|
||
layers/<sha256>/sbom.cdx.json.zst
|
||
layers/<sha256>/sbom.spdx.json.zst
|
||
images/<imgDigest>/inventory.cdx.pb
|
||
images/<imgDigest>/usage.cdx.pb
|
||
indexes/<imgDigest>/bom-index.bin
|
||
attest/<artifactSha256>.dsse.json
|
||
```
|
||
|
||
### Database Architecture (PostgreSQL)
|
||
|
||
StellaOps uses PostgreSQL for all control-plane data with **per-module schema isolation**. Each module owns and manages only its own schema, ensuring clear ownership and independent migration lifecycles.
|
||
|
||
**Schema topology:**
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ PostgreSQL Cluster │
|
||
│ ┌─────────────────────────────────────────────────────────────┐│
|
||
│ │ stellaops (database) ││
|
||
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││
|
||
│ │ │ auth │ │ vuln │ │ vex │ │scheduler│ ││
|
||
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ││
|
||
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││
|
||
│ │ │ notify │ │ policy │ │ audit │ ││
|
||
│ │ └─────────┘ └─────────┘ └─────────┘ ││
|
||
│ └─────────────────────────────────────────────────────────────┘│
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Schema ownership:**
|
||
|
||
| Schema | Owner Module | Purpose |
|
||
|--------|--------------|---------|
|
||
| `auth` | Authority | Identity, authentication, authorization, licensing, sessions |
|
||
| `vuln` | Concelier | Vulnerability advisories, CVSS, affected packages, sources |
|
||
| `vex` | Excititor | VEX statements, graphs, observations, evidence, consensus |
|
||
| `scheduler` | Scheduler | Jobs, triggers, workers, locks, execution history |
|
||
| `notify` | Notify | Channels, templates, rules, deliveries, escalations |
|
||
| `policy` | Policy | Policy packs, rules, risk profiles, evaluations |
|
||
| `audit` | Shared | Cross-cutting audit log (optional) |
|
||
|
||
**Key design principles:**
|
||
|
||
1. **Module isolation** — Each module controls only its own schema. Cross-schema queries are rare and explicitly documented.
|
||
2. **Multi-tenancy** — Single database, single schema set, `tenant_id` column on all tenant-scoped tables with row-level security.
|
||
3. **Forward-only migrations** — No down migrations; fixes are applied as new forward migrations.
|
||
4. **Advisory lock coordination** — Startup migrations use `pg_try_advisory_lock(hashtext('schema_name'))` to prevent concurrent execution.
|
||
5. **Air-gap compatible** — All migrations embedded in assemblies, no external network dependencies.
|
||
|
||
**Migration categories:**
|
||
|
||
| Category | Prefix | Execution | Description |
|
||
|----------|--------|-----------|-------------|
|
||
| Startup (A) | `001-099` | Automatic at boot | Non-breaking DDL (CREATE IF NOT EXISTS, ADD COLUMN nullable) |
|
||
| Release (B) | `100-199` | Manual via CLI | Breaking changes (DROP, ALTER TYPE), require maintenance window |
|
||
| Seed | `S001-S999` | After schema | Reference data with ON CONFLICT DO NOTHING |
|
||
| Data (C) | `DM001-DM999` | Background job | Batched data transformations, resumable |
|
||
|
||
**Detailed documentation:** See [`docs/db/`](db/README.md) for full specification, coding rules, and phase-by-phase conversion tasks.
|
||
|
||
**Operations guide:** See [`docs/operations/postgresql-guide.md`](operations/postgresql-guide.md) for performance tuning, monitoring, backup/restore, and scaling.
|
||
|
||
**Retention**
|
||
|
||
* RustFS applies retention via `X-RustFS-Retain-Seconds`; Scanner.WebService GC decrements `refCount` and deletes unreferenced metadata; S3/MinIO fallback retains native Object Lock when enabled.
|
||
* PostgreSQL retention managed via time-based partitioning for high-volume tables (runs, execution_logs) with monthly partition drops.
|
||
|
||
---
|
||
|
||
## 7) APIs (consolidated surface)
|
||
|
||
### 7.1 Scanner.WebService
|
||
|
||
```
|
||
POST /api/scans { imageRef|digest, force? } → { scanId }
|
||
GET /api/scans/{id} → { status, digests, artifacts[] }
|
||
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
|
||
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
|
||
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
|
||
POST /api/reports { imageDigest, policyRevision?, vexSnapshot? } → { reportId, verdict, rekorUrl }
|
||
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
|
||
GET /healthz | /readyz | /metrics
|
||
```
|
||
|
||
### 7.2 Signer (mTLS; hard gate)
|
||
|
||
```
|
||
POST /sign/dsse # body: {subjectHash, imageDigest, predicate}; headers: OpTok (DPoP/mTLS) + PoE
|
||
GET /verify/referrers?imageDigest=sha256:... # is this image StellaOps-signed?
|
||
```
|
||
|
||
### 7.3 Attestor (mTLS)
|
||
|
||
```
|
||
POST /rekor/entries # DSSE bundle → {uuid, index, proof, logURL}
|
||
GET /rekor/entries/{uuid}
|
||
```
|
||
|
||
### 7.4 Authority (OIDC)
|
||
|
||
* `/.well-known/openid-configuration`, `/oauth/token` (DPoP/mTLS), `/oauth/introspect`, `/jwks`
|
||
|
||
### 7.5 Licensing (cloud)
|
||
|
||
```
|
||
POST /license/enroll { LT, pubKey } → PoE + introspection endpoints
|
||
POST /license/revoke { license_id } → ok
|
||
POST /license/introspect { poe } → { active, claims, exp }
|
||
POST /attest/endorse { bundle } → endorsement bundle (optional)
|
||
```
|
||
|
||
### 7.6 Scheduler
|
||
|
||
```
|
||
POST /api/v1/scheduler/schedules {yaml|json} → { scheduleId }
|
||
GET /api/v1/scheduler/schedules → [ { id, nextRun, status, stats } ]
|
||
POST /api/v1/scheduler/run { id|selector } → { runId }
|
||
GET /api/v1/scheduler/runs/{id} → { status, counts, links }
|
||
GET /api/v1/scheduler/cursor → { lastConcelierExportId, lastExcititorExportId }
|
||
```
|
||
|
||
### 7.7 Notify
|
||
|
||
```
|
||
POST /api/v1/notify/test { channel, target } → { delivered }
|
||
POST /api/v1/notify/rules {yaml|json} → { ruleId }
|
||
GET /api/v1/notify/rules → [ { id, match, actions, enabled } ]
|
||
GET /api/v1/notify/deliveries → [ { id, eventId, channel, status, attempts } ]
|
||
```
|
||
|
||
---
|
||
|
||
## 8) Security & verifiability
|
||
|
||
* **Sender‑constrained tokens.** All operational calls use **DPoP** (RFC 9449) or **mTLS‑bound** tokens (RFC 8705).
|
||
* **Entitlement.** **PoE** is mandatory; revocation honored online.
|
||
* **Release integrity.** **Signer** independently verifies **scanner image digest** via **Referrers + cosign** before signing.
|
||
* **Separation of duties.** Scanner/UI/Scheduler/Notify cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
|
||
* **Verifiers.** Anyone can verify: DSSE signature → certificate chain to **Stella Ops Fulcio/KMS root** → **Rekor v2** inclusion.
|
||
* **RBAC.** Roles: `scanner.admin|read`, `scheduler.admin|read`, `notify.admin|read`, `zastava.admin|read`.
|
||
* **Community vs Authorized.** Free/community runs throttled with no official attestations; authorized runs full speed and produce **Stella Ops‑verified** bundles.
|
||
|
||
**DSSE predicate (SBOM/report)**
|
||
|
||
```json
|
||
{
|
||
"predicateType": "https://stella-ops.org/attestations/sbom/1",
|
||
"subject": [{ "name": "s3://stellaops/images/<digest>/inventory.cdx.pb", "digest": { "sha256": "<sha256>" } }],
|
||
"predicate": {
|
||
"image_digest": "<sha256:...>",
|
||
"stellaops_version": "2.3.1 (2027.04)",
|
||
"license_id": "LIC-9F2A...",
|
||
"customer_id": "CUST-ACME",
|
||
"plan": "pro",
|
||
"policy_digest": "sha256:...",
|
||
"views": ["inventory","usage"],
|
||
"created": "2025-10-17T12:34:56Z"
|
||
}
|
||
}
|
||
```
|
||
|
||
**BOM‑Index sidecar**
|
||
Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags for fast policy joins.
|
||
|
||
---
|
||
|
||
## 9) Scale, performance & quotas
|
||
|
||
* **Workers:** horizontal; **distributed lock per layer digest**; global CAS in MinIO.
|
||
* **Queues:** Redis Streams / NATS / RabbitMQ. HPA by queue depth, CPU, memory.
|
||
* **Registry throttling:** per‑registry concurrency budgets.
|
||
* **Targets:**
|
||
|
||
* Build‑time path P95 ≤ 3–5 s on warmed bases.
|
||
* Post‑build delta scan P95 ≤ 10 s for 200 MB images.
|
||
* Policy + VEX evaluation ≤ 500 ms for 5k components using BOM‑Index.
|
||
* **Event → notification** p95 ≤ **30–60 s** under nominal load.
|
||
* **Export delta → re‑evaluation verdict** p95 ≤ **5 min** for 10k impacted images.
|
||
* **Quotas:** license plan enforces QPS/concurrency/size; **Signer** throttles and can deny DSSE.
|
||
|
||
---
|
||
|
||
## 10) DevOps & distribution
|
||
|
||
* **Releases:** all first‑party images **cosign‑signed**; labels embed `org.stellaops.version` and `org.stellaops.release_date`.
|
||
* **Channels:**
|
||
|
||
* **Community** (public registry): throttled, non‑attesting.
|
||
* **Authorized** (private registry): full speed, DSSE enabled.
|
||
* **Client update flow:** containers self‑verify signatures at boot; report version; **Signer** enforces `valid_release_year` / `max_version` from PoE before signing.
|
||
* **Compose skeleton:**
|
||
|
||
```yaml
|
||
services:
|
||
authority: { image: stellaops/authority, depends_on: [postgres] }
|
||
fulcio: { image: sigstore/fulcio }
|
||
rekor: { image: sigstore/rekor-v2 }
|
||
minio: { image: minio/minio, command: server /data --console-address ":9001" }
|
||
postgres: { image: postgres:15-alpine, environment: { POSTGRES_DB: stellaops, POSTGRES_USER: stellaops } }
|
||
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
|
||
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
|
||
scanner-web: { image: stellaops/scanner-web, depends_on: [postgres, minio, signer, attestor] }
|
||
scanner-worker: { image: stellaops/scanner-worker, deploy: { replicas: 4 }, depends_on: [scanner-web] }
|
||
concelier: { image: stellaops/concelier-web, depends_on: [postgres] }
|
||
excititor: { image: stellaops/excititor-web, depends_on: [postgres] }
|
||
scheduler-web: { image: stellaops/scheduler-web, depends_on: [postgres] }
|
||
scheduler-worker:{ image: stellaops/scheduler-worker, deploy: { replicas: 2 }, depends_on: [scheduler-web] }
|
||
notify-web: { image: stellaops/notify-web, depends_on: [postgres] }
|
||
notify-worker: { image: stellaops/notify-worker, deploy: { replicas: 2 }, depends_on: [notify-web] }
|
||
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor, scheduler-web, notify-web] }
|
||
```
|
||
|
||
* **Binary prerequisites (offline-first):**
|
||
|
||
* NuGet packages restore from standard feeds configured in `nuget.config` (dotnet-public, nuget-mirror, nuget.org) to the global NuGet cache. For air-gapped environments, use `dotnet restore --source <offline-feed-path>` pointing to a local `.nupkg` mirror.
|
||
* Non-NuGet binaries (plugins/CLIs/tools) are catalogued with SHA-256 in `vendor/manifest.json`; air-gap bundles are registered in `offline/feeds/manifest.json`.
|
||
* CI guard: `scripts/verify-binaries.sh` blocks binaries outside approved roots; offline restores use `dotnet restore --source <offline-feed>` with `OFFLINE=1` (override via `ALLOW_REMOTE=1`).
|
||
|
||
* **Backups:** PostgreSQL dumps (pg_dump) and WAL archiving; RustFS snapshots (or S3 versioning when fallback driver is used); Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation. See [`docs/operations/postgresql-guide.md`](operations/postgresql-guide.md).
|
||
* **Ops runbooks:** Scheduler catch‑up after Concelier/Excititor recovery; connector key rotation (Slack/Teams/SMTP).
|
||
* **SLOs & alerts:** lag between Concelier/Excititor export and first rescan verdict; delivery failure rates by channel.
|
||
|
||
---
|
||
|
||
## 11) Observability & audit
|
||
|
||
* **Metrics:** scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
|
||
* **Scheduler metrics:** `scheduler.impacted_images_total`, `scheduler.jobs_enqueued_total`, `scheduler.selection_ms`, end‑to‑end p95 (event → verdict).
|
||
* **Notify metrics:** `notify.sent_total{channel}`, `notify.dropped_total{reason}`, `notify.digest_coalesced_total`, `notify.latency_ms`.
|
||
* **Tracing:** per‑stage spans; correlation IDs across Scanner→Signer→Attestor and Concelier/Excititor→Scheduler→Scanner→Notify.
|
||
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID; Scheduler records who scheduled what; Notify records where, when, and why messages were sent or deduped.
|
||
* **Compliance:** RustFS retention headers (or MinIO Object Lock when operating in S3 mode) keep immutable artifacts tamper‑resistant; reproducible outputs via policy digest + SBOM digest in predicate.
|
||
|
||
---
|
||
|
||
## 12) Roadmap (anchored to this architecture)
|
||
|
||
* M2: Windows MSI/SxS/GAC analyzers; deeper Rust (DWARF enrichers).
|
||
* M2: Buildx generator certified flows; cross‑registry trust policies.
|
||
* M3: Patch‑Presence plugin (signature‑based backport detection), opt‑in.
|
||
* M3: Zastava Admission control GA with policy presets and dry‑run→enforce stages.
|
||
* M3: **Scheduler GA** with export‑delta impact routing and capacity‑aware pacing.
|
||
* M3: **Notify GA** with digests, Slack/Teams/Email/Webhooks; **M4:** PagerDuty/Opsgenie connectors.
|
||
* Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.
|
||
|
||
---
|
||
|
||
## 13) Canonical sequences (verification, re‑evaluation & notify)
|
||
|
||
**Sign & log (OpTok + PoE, image verify, DSSE, Rekor).**
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
autonumber
|
||
participant Scan as Scanner.WebService
|
||
participant Auth as Authority (OIDC)
|
||
participant Sign as Signer
|
||
participant Reg as OCI Registry
|
||
participant Ful as Fulcio/KMS
|
||
participant Att as Attestor
|
||
participant Rek as Rekor v2
|
||
|
||
Scan->>Auth: Get OpTok (DPoP/mTLS)
|
||
Scan->>Sign: sign(request) + OpTok + PoE + DPoP proof
|
||
Sign->>Auth: Validate OpTok & sender-constraint
|
||
Sign->>Sign: Validate PoE (introspect/revocation)
|
||
Sign->>Reg: Verify scanner image is StellaOps-signed (Referrers + cosign)
|
||
alt OK
|
||
Sign->>Ful: Get signing cert (keyless) or use KMS key
|
||
Sign-->>Scan: DSSE bundle (cert chain)
|
||
Scan->>Att: Submit bundle
|
||
Att-->>Rek: Create entry
|
||
Rek-->>Att: {uuid,index,proof}
|
||
Att-->>Scan: Rekor URL
|
||
else Deny
|
||
Sign-->>Scan: 403 (no attestation)
|
||
end
|
||
```
|
||
|
||
**Event‑driven re‑evaluation & notify.**
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CONC as Concelier
|
||
participant EXC as Excititor
|
||
participant SCH as Scheduler
|
||
participant SC as Scanner.WebService
|
||
participant NO as Notify
|
||
|
||
CONC->>SCH: export.delta {changedProductKeys, exportId}
|
||
EXC ->>SCH: export.delta {changedProductKeys, exportId}
|
||
SCH->>SCH: Impact select via BOM-Index bitmaps
|
||
SCH->>SC: Enqueue analysis-only reports (batches)
|
||
SC-->>SCH: verdict stream (PASS/FAIL, deltas)
|
||
SCH->>NO: rescan.delta {imageDigest, newCriticals, links}
|
||
NO-->>Slack/Teams/Email/Webhook: deliver (throttle/digest rules applied)
|
||
```
|
||
|
||
---
|
||
|
||
## 14) Minimal data shapes (Scheduler & Notify)
|
||
|
||
**Scheduler schedule (YAML via UI/CLI)**
|
||
|
||
```yaml
|
||
name: nightly-eu
|
||
when: "0 2 * * * Europe/Sofia"
|
||
mode: analysis-only # or content-refresh
|
||
selection:
|
||
scope: all-images # or tenant/ns/repo label selectors
|
||
onlyIf: { lastReportOlderThanDays: 7 }
|
||
notify:
|
||
onNewFindings: true
|
||
minSeverity: high
|
||
limits:
|
||
maxJobs: 5000
|
||
ratePerSecond: 50
|
||
```
|
||
|
||
**Notify rule (YAML)**
|
||
|
||
```yaml
|
||
name: high-critical-alerts
|
||
match:
|
||
eventKinds: ["report.ready","rescan.delta","zastava.admission"]
|
||
minSeverity: high
|
||
namespaces: ["prod-*"]
|
||
vex: { includeAcceptedJustifications: false }
|
||
actions:
|
||
- channel: slack
|
||
target: "#sec-alerts"
|
||
template: "concise"
|
||
throttle: "5m"
|
||
- channel: email
|
||
target: "soc@acme.org"
|
||
digest: "hourly"
|
||
enabled: true
|
||
```
|