- Implemented RustFsArtifactObjectStore for managing artifacts in RustFS. - Added unit tests for RustFsArtifactObjectStore functionality. - Created a RustFS migrator tool to transfer objects from S3 to RustFS. - Introduced policy preview and report models for API integration. - Added fixtures and tests for policy preview and report functionality. - Included necessary metadata and scripts for cache_pkg package.
26 KiB
Executable File
High‑Level Architecture — Stella Ops (Consolidated • 2025Q4)
Purpose. A complete, implementation‑ready map of Stella Ops: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic. Scope. This file replaces the separate
components.md; all component details now live here.
0) Product vision & principles
Vision. Stella Ops is a deterministic SBOM + VEX platform for CI/CD and runtime, tuned for speed (per‑layer deltas), quiet output (usage‑scoped views), and verifiability (DSSE + Rekor v2). It is self‑hostable, air‑gap capable, and commercially enforceable: only licensed installations can produce Stella Ops‑verified attestations.
Operating principles.
- Scanner‑owned SBOMs. We generate our own BOMs; we do not warehouse third‑party SBOM content (we can link to attested SBOMs).
- Deterministic evidence. Facts come from package DBs, installed metadata, linkers, and verified attestations; no fuzzy guessing in the core.
- Per‑layer caching. Cache fragments by layer digest and compose image SBOMs via CycloneDX BOM‑Link / SPDX ExternalRef.
- Inventory vs Usage. Always record the full inventory of what exists; separately present usage (entrypoint closure + loaded libs).
- Backend decides. PASS/FAIL is produced by Policy + VEX + Advisories. The scanner reports facts.
- Attest or it didn’t happen. Every export is signed as in‑toto/DSSE and logged in Rekor v2.
- Sovereign‑ready. Cloud is used only for licensing and optional endorsement; everything else is first‑party and self‑hostable.
1) Service topology & trust boundaries
1.1 Runtime inventory (first‑party)
| Service / Tool | Container image | Core role | Scale pattern |
|---|---|---|---|
| Scanner.WebService | stellaops/scanner-web |
Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports; analysis‑only report runs for Scheduler. | Stateless; N replicas behind LB. |
| Scanner.Worker | stellaops/scanner-worker |
Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/Mach‑O, EntryTrace); emits per‑layer SBOMs and composes image SBOMs. | Horizontal; queue‑driven; sharded by layer digest. |
| Scanner.Sbomer.BuildXPlugin | stellaops/sbom-indexer |
BuildKit generator for build‑time SBOMs as OCI referrers. | CI‑side; ephemeral. |
| Scanner.Sbomer.DockerImage | stellaops/scanner-cli |
CLI‑orchestrated scanner container for post‑build scans. | Local/CI; ephemeral. |
| Concelier.WebService | stellaops/concelier-web |
Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
| Excititor.WebService | stellaops/excititor-web |
VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
| Policy Engine | (in scanner-web) |
YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usage‑gating); produces policy digest. | In‑process; cache per digest. |
| Scheduler.WebService | stellaops/scheduler-web |
Schedules re‑evaluation runs; consumes Concelier/Excititor deltas; selects impacted images via BOM‑Index; orchestrates analysis‑only reports. | Stateless API. |
| Scheduler.Worker | stellaops/scheduler-worker |
Executes selection and enqueues batches toward Scanner; enforces rate/limits and windows; maintains impact cursors. | Horizontal; queue‑driven. |
| Notify.WebService | stellaops/notify-web |
Rules engine for outbound notifications; manages channels, templates, throttle/digest logic. | Stateless API. |
| Notify.Worker | stellaops/notify-worker |
Delivers to Slack/Teams/Email/Webhooks; idempotent retries; digests. | Horizontal; per‑channel rate limits. |
| Signer | stellaops/signer |
Hard gate: validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
| Attestor | stellaops/attestor |
Posts DSSE bundles to Rekor v2; verification endpoints. | Stateless; HPA by QPS. |
| Authority | stellaops/authority |
On‑prem OIDC issuing short‑lived OpToks with DPoP/mTLS sender constraint. | HA behind LB. |
| Zastava (Runtime) | stellaops/zastava |
Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
| Web UI | stellaops/ui |
Angular app for scans, diffs, policy, VEX, Scheduler, Notify, runtime, reports. | Stateless. |
| StellaOps.Cli | stellaops/cli |
CLI for init/scan/export/diff/policy/report/verify; Buildx helper; schedule and notify verbs. | Local/CI. |
1.2 Third‑party (self‑hosted)
- Fulcio (Sigstore CA) — issues short‑lived signing certs (keyless).
- Rekor v2 (tile‑backed transparency log).
- RustFS — offline-first object store with deterministic REST API (S3/MinIO fallback available for legacy installs).
- MongoDB — catalog, advisories, VEX, scheduler, notify.
- Queue — Redis Streams / NATS / RabbitMQ (pluggable).
- OCI Registry — must support Referrers API (discover SBOMs/signatures).
1.3 Cloud licensing (Stella Ops)
- Licensing Service (
www.stella-ops.org) — issues long‑lived License Tokens (LT); exchanges LT → Proof‑of‑Entitlement (PoE) bound to an installation key; revoke/introspect PoE; optional cross‑log endorsement.
1.4 Diagram (control/data planes & trust)
flowchart LR
subgraph Cloud["www.stella-ops.org (Cloud)"]
LS[Licensing Service<br/>LT→PoE / revoke / introspect]
end
subgraph OnPrem["Customer Site (Self-hosted)"]
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
SW[Scanner.WebService]
WK[Scanner.Worker xN]
CONC[Concelier]
EXC[Excititor]
SCHW[Scheduler.Web]
SCH[Scheduler.Worker xN]
NOTW[Notify.Web]
NOT[Notify.Worker xN]
POL[Policy Engine (in Scanner.Web)]
SGN[Signer\n(entitlement + signing)]
ATT[Attestor\n(Rekor v2 submit/verify)]
UI[Web UI (Angular)]
Z[Zastava\n(Runtime Inspector/Enforcer)]
RFS[(RustFS object store)]
MGO[(MongoDB)]
QUE[(Queue/Streams)]
end
CLI[StellaOps.Cli / Buildx Plugin]
REG[(OCI Registry with Referrers)]
FUL[ Fulcio ]
REK[ Rekor v2 (tiles) ]
CLI -->|scan/build| SW
SW -->|jobs| QUE
QUE --> WK
WK --> RFS
SW --> MGO
CONC --> MGO
EXC --> MGO
UI --> SW
Z --> SW
%% New event-driven loop
CONC -- export.delta --> SCHW
EXC -- export.delta --> SCHW
SCHW --> SCH
SCH --> SW
SW -- report.ready --> NOTW
Z -- admission/observe --> NOTW
SGN <--> Auth
SGN --> FUL
SGN -->|mTLS| ATT
ATT --> REK
SGN <-->|verify referrers| REG
Trust boundaries. Only Signer can sign; only Attestor can write to Rekor v2. Scanner/UI/Scheduler/Notify never sign.
2) Licensing & tokens (installation‑ready, theft‑resistant)
Two‑token model.
- License Token (LT) — long‑lived JWT from Licensing Service; used once to enroll the installation; never used in hot path.
- Proof‑of‑Entitlement (PoE) — bound to the installation key (mTLS client cert or DPoP‑bound JWT with
cnf); medium‑lived; renewable; revocable. - Operational token (OpTok) — 2–5 min OIDC token from Authority, sender‑constrained (DPoP or mTLS). Used to authenticate to Signer/Scanner.WebService/Scheduler.Web/Notify.Web.
Signer enforces both: PoE proves entitlement; OpTok proves “who is calling now”. It also independently verifies the scanner image digest is Stella Ops‑signed via Referrers + cosign before signing anything.
Enrollment sequence (LT → PoE).
@startuml
actor Operator
participant "Install Agent" as IA
participant "Licensing Service" as LS
Operator -> IA: Provide LT
IA -> IA: Generate K_inst
IA -> LS: /license/enroll {LT, pub(K_inst)}
LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
@enduml
3) Scanner subsystem (facts engine)
3.1 Analyzers (deterministic only)
-
OS packages: apk/dpkg/rpm (Linux); Windows MSI/SxS/GAC (M2).
-
Language (installed state):
- Java (pom.properties / MANIFEST) →
pkg:maven/... - Node (
node_modules/*/package.json) →pkg:npm/... - Python (
*.dist-info/METADATA) →pkg:pypi/... - Go (buildinfo) →
pkg:golang/... - .NET (
*.deps.json) →pkg:nuget/... - Rust: deterministic language markers (symbol mangling) and crates only when present; otherwise
bin:{sha256}.
- Java (pom.properties / MANIFEST) →
-
Native: ELF/PE/Mach‑O imports, DT_NEEDED, RPATH/RUNPATH, symbol versions, PE version info.
-
EntryTrace: parse
ENTRYPOINT/CMD; shell AST; resolve launchers (Java/Node/Python) to terminal program; record file:line chain.
3.2 Caching & composition
-
Layer cache:
{layerDigest → SBOM fragment + analyzer meta}. -
File CAS:
{sha256(file) → parse result (ELF/JAR metadata/etc.)}. -
Composition: build image SBOMs from fragments via BOM‑Link/ExternalRef; emit two views:
- Inventory (complete filesystem inventory).
- Usage (entrypoint closure + linked libs).
-
Transport: JSON and CycloneDX Protobuf (compact, fast to parse).
-
Index: BOM‑Index sidecar with purl table + roaring bitmap +
usedByEntrypointflag for fast joins.
3.3 Diff (image → layer → package)
- Added / Removed / Version‑changed changes, attributed to the layer that caused them.
- Raw diffs preserved; backend view applies VEX + Policy.
3.4 Build‑time SBOMs (fast CI path)
- Buildx generator runs analyzers during
docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer, attaches SBOMs as OCI referrers. - Scanner.WebService can trust these (policy‑configurable) and skip re‑scan; DSSE + Rekor v2 can be done either at build time or post‑push via Signer/Attestor.
3.5 Events / integrations
- Out:
report.ready(summary + verdict + Rekor UUID) → internal bus for Notify & UI. - Expose: image‑level BOM‑Index metadata for Scheduler impact selection.
4) Backend evaluation (decider)
4.1 Concelier (advisories)
- Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in Mongo; exports deterministic JSON and Trivy DB.
- Offline kit bundles for air‑gapped sites.
4.2 Excititor (VEX)
- Ingests OpenVEX / CSAF VEX / CycloneDX VEX; normalizes claims; retains conflicts; computes consensus with provider trust weights and justification gates.
4.3 Policy Engine (YAML DSL)
- Matchers:
image/repo/env/purl/cve/vendor/source/path/layerDigest/usedByEntrypoint - Actions:
ignore(until, justification),fail,warn,defer,requireVEX{vendors, justifications},escalate {sev, KEV, EPSS}, license constraints. - Produces a policy digest (SHA‑256 of canonicalized policy).
4.4 PASS/FAIL flow
- SBOM (Inventory / Usage) → join with Concelier advisories.
- Apply Excititor consensus (statuses & justifications).
- Apply Policy; compute PASS/FAIL with waiver TTLs.
- Sign the final report (DSSE via Signer) and log to Rekor v2 via Attestor.
5) Runtime enforcement (Zastava)
- Observer: inventories running containers, checks image signatures, SBOM presence (referrers), detects drift (entrypoint chain divergence), flags unapproved images.
- Admission Webhook (optional): blocks policy‑fail pods (dry‑run first).
- Integration: posts runtime events to Scanner.WebService; can request delta scans on changed layers.
6) Storage & catalogs (RustFS/Mongo)
RustFS layout (default)
rustfs://stellaops/
layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin
attest/<artifactSha256>.dsse.json
Catalog (Mongo)
artifacts(type/format/sha/size/rekor/ttl/immutable/refCount/createdAt)images,layers,links,lifecycleRules- Scheduler:
schedules,runs,locks,impact_cursors - Notify:
rules,deliveries,channels,templates
Retention
- RustFS applies retention via
X-RustFS-Retain-Seconds; Scanner.WebService GC decrementsrefCountand deletes unreferenced metadata; S3/MinIO fallback retains native Object Lock when enabled.
7) APIs (consolidated surface)
7.1 Scanner.WebService
POST /api/scans { imageRef|digest, force? } → { scanId }
GET /api/scans/{id} → { status, digests, artifacts[] }
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
POST /api/reports { imageDigest, policyRevision?, vexSnapshot? } → { reportId, verdict, rekorUrl }
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
GET /healthz | /readyz | /metrics
7.2 Signer (mTLS; hard gate)
POST /sign/dsse # body: {subjectHash, imageDigest, predicate}; headers: OpTok (DPoP/mTLS) + PoE
GET /verify/referrers?imageDigest=sha256:... # is this image StellaOps-signed?
7.3 Attestor (mTLS)
POST /rekor/entries # DSSE bundle → {uuid, index, proof, logURL}
GET /rekor/entries/{uuid}
7.4 Authority (OIDC)
/.well-known/openid-configuration,/oauth/token(DPoP/mTLS),/oauth/introspect,/jwks
7.5 Licensing (cloud)
POST /license/enroll { LT, pubKey } → PoE + introspection endpoints
POST /license/revoke { license_id } → ok
POST /license/introspect { poe } → { active, claims, exp }
POST /attest/endorse { bundle } → endorsement bundle (optional)
7.6 Scheduler
POST /api/v1/scheduler/schedules {yaml|json} → { scheduleId }
GET /api/v1/scheduler/schedules → [ { id, nextRun, status, stats } ]
POST /api/v1/scheduler/run { id|selector } → { runId }
GET /api/v1/scheduler/runs/{id} → { status, counts, links }
GET /api/v1/scheduler/cursor → { lastConcelierExportId, lastExcititorExportId }
7.7 Notify
POST /api/v1/notify/test { channel, target } → { delivered }
POST /api/v1/notify/rules {yaml|json} → { ruleId }
GET /api/v1/notify/rules → [ { id, match, actions, enabled } ]
GET /api/v1/notify/deliveries → [ { id, eventId, channel, status, attempts } ]
8) Security & verifiability
- Sender‑constrained tokens. All operational calls use DPoP (RFC 9449) or mTLS‑bound tokens (RFC 8705).
- Entitlement. PoE is mandatory; revocation honored online.
- Release integrity. Signer independently verifies scanner image digest via Referrers + cosign before signing.
- Separation of duties. Scanner/UI/Scheduler/Notify cannot sign; only Signer can sign; only Attestor can write to Rekor v2.
- Verifiers. Anyone can verify: DSSE signature → certificate chain to Stella Ops Fulcio/KMS root → Rekor v2 inclusion.
- RBAC. Roles:
scanner.admin|read,scheduler.admin|read,notify.admin|read,zastava.admin|read. - Community vs Authorized. Free/community runs throttled with no official attestations; authorized runs full speed and produce Stella Ops‑verified bundles.
DSSE predicate (SBOM/report)
{
"predicateType": "https://stella-ops.org/attestations/sbom/1",
"subject": [{ "name": "s3://stellaops/images/<digest>/inventory.cdx.pb", "digest": { "sha256": "<sha256>" } }],
"predicate": {
"image_digest": "<sha256:...>",
"stellaops_version": "2.3.1 (2027.04)",
"license_id": "LIC-9F2A...",
"customer_id": "CUST-ACME",
"plan": "pro",
"policy_digest": "sha256:...",
"views": ["inventory","usage"],
"created": "2025-10-17T12:34:56Z"
}
}
BOM‑Index sidecar
Binary header + purl table + roaring bitmaps; optional usedByEntrypoint flags for fast policy joins.
9) Scale, performance & quotas
-
Workers: horizontal; distributed lock per layer digest; global CAS in MinIO.
-
Queues: Redis Streams / NATS / RabbitMQ. HPA by queue depth, CPU, memory.
-
Registry throttling: per‑registry concurrency budgets.
-
Targets:
- Build‑time path P95 ≤ 3–5 s on warmed bases.
- Post‑build delta scan P95 ≤ 10 s for 200 MB images.
- Policy + VEX evaluation ≤ 500 ms for 5k components using BOM‑Index.
- Event → notification p95 ≤ 30–60 s under nominal load.
- Export delta → re‑evaluation verdict p95 ≤ 5 min for 10k impacted images.
-
Quotas: license plan enforces QPS/concurrency/size; Signer throttles and can deny DSSE.
10) DevOps & distribution
-
Releases: all first‑party images cosign‑signed; labels embed
org.stellaops.versionandorg.stellaops.release_date. -
Channels:
- Community (public registry): throttled, non‑attesting.
- Authorized (private registry): full speed, DSSE enabled.
-
Client update flow: containers self‑verify signatures at boot; report version; Signer enforces
valid_release_year/max_versionfrom PoE before signing. -
Compose skeleton:
services:
authority: { image: stellaops/authority }
fulcio: { image: sigstore/fulcio }
rekor: { image: sigstore/rekor-v2 }
minio: { image: minio/minio, command: server /data --console-address ":9001" }
mongo: { image: mongo:7 }
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
scanner-web: { image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
scanner-worker: { image: stellaops/scanner-worker, deploy: { replicas: 4 }, depends_on: [scanner-web] }
concelier: { image: stellaops/concelier-web, depends_on: [mongo] }
excititor: { image: stellaops/excititor-web, depends_on: [mongo] }
scheduler-web: { image: stellaops/scheduler-web, depends_on: [mongo] }
scheduler-worker:{ image: stellaops/scheduler-worker, deploy: { replicas: 2 }, depends_on: [scheduler-web] }
notify-web: { image: stellaops/notify-web, depends_on: [mongo] }
notify-worker: { image: stellaops/notify-worker, deploy: { replicas: 2 }, depends_on: [notify-web] }
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor, scheduler-web, notify-web] }
- Backups: Mongo dumps; RustFS snapshots (or S3 versioning when fallback driver is used); Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation.
- Ops runbooks: Scheduler catch‑up after Concelier/Excititor recovery; connector key rotation (Slack/Teams/SMTP).
- SLOs & alerts: lag between Concelier/Excititor export and first rescan verdict; delivery failure rates by channel.
11) Observability & audit
- Metrics: scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
- Scheduler metrics:
scheduler.impacted_images_total,scheduler.jobs_enqueued_total,scheduler.selection_ms, end‑to‑end p95 (event → verdict). - Notify metrics:
notify.sent_total{channel},notify.dropped_total{reason},notify.digest_coalesced_total,notify.latency_ms. - Tracing: per‑stage spans; correlation IDs across Scanner→Signer→Attestor and Concelier/Excititor→Scheduler→Scanner→Notify.
- Audit logs: every signing records
license_id,image_digest,policy_digest, and Rekor UUID; Scheduler records who scheduled what; Notify records where, when, and why messages were sent or deduped. - Compliance: RustFS retention headers (or MinIO Object Lock when operating in S3 mode) keep immutable artifacts tamper‑resistant; reproducible outputs via policy digest + SBOM digest in predicate.
12) Roadmap (anchored to this architecture)
- M2: Windows MSI/SxS/GAC analyzers; deeper Rust (DWARF enrichers).
- M2: Buildx generator certified flows; cross‑registry trust policies.
- M3: Patch‑Presence plugin (signature‑based backport detection), opt‑in.
- M3: Zastava Admission control GA with policy presets and dry‑run→enforce stages.
- M3: Scheduler GA with export‑delta impact routing and capacity‑aware pacing.
- M3: Notify GA with digests, Slack/Teams/Email/Webhooks; M4: PagerDuty/Opsgenie connectors.
- Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.
13) Canonical sequences (verification, re‑evaluation & notify)
Sign & log (OpTok + PoE, image verify, DSSE, Rekor).
sequenceDiagram
autonumber
participant Scan as Scanner.WebService
participant Auth as Authority (OIDC)
participant Sign as Signer
participant Reg as OCI Registry
participant Ful as Fulcio/KMS
participant Att as Attestor
participant Rek as Rekor v2
Scan->>Auth: Get OpTok (DPoP/mTLS)
Scan->>Sign: sign(request) + OpTok + PoE + DPoP proof
Sign->>Auth: Validate OpTok & sender-constraint
Sign->>Sign: Validate PoE (introspect/revocation)
Sign->>Reg: Verify scanner image is StellaOps-signed (Referrers + cosign)
alt OK
Sign->>Ful: Get signing cert (keyless) or use KMS key
Sign-->>Scan: DSSE bundle (cert chain)
Scan->>Att: Submit bundle
Att-->>Rek: Create entry
Rek-->>Att: {uuid,index,proof}
Att-->>Scan: Rekor URL
else Deny
Sign-->>Scan: 403 (no attestation)
end
Event‑driven re‑evaluation & notify.
sequenceDiagram
participant CONC as Concelier
participant EXC as Excititor
participant SCH as Scheduler
participant SC as Scanner.WebService
participant NO as Notify
CONC->>SCH: export.delta {changedProductKeys, exportId}
EXC ->>SCH: export.delta {changedProductKeys, exportId}
SCH->>SCH: Impact select via BOM-Index bitmaps
SCH->>SC: Enqueue analysis-only reports (batches)
SC-->>SCH: verdict stream (PASS/FAIL, deltas)
SCH->>NO: rescan.delta {imageDigest, newCriticals, links}
NO-->>Slack/Teams/Email/Webhook: deliver (throttle/digest rules applied)
14) Minimal data shapes (Scheduler & Notify)
Scheduler schedule (YAML via UI/CLI)
name: nightly-eu
when: "0 2 * * * Europe/Sofia"
mode: analysis-only # or content-refresh
selection:
scope: all-images # or tenant/ns/repo label selectors
onlyIf: { lastReportOlderThanDays: 7 }
notify:
onNewFindings: true
minSeverity: high
limits:
maxJobs: 5000
ratePerSecond: 50
Notify rule (YAML)
name: high-critical-alerts
match:
eventKinds: ["report.ready","rescan.delta","zastava.admission"]
minSeverity: high
namespaces: ["prod-*"]
vex: { includeAcceptedJustifications: false }
actions:
- channel: slack
target: "#sec-alerts"
template: "concise"
throttle: "5m"
- channel: email
target: "soc@acme.org"
digest: "hourly"
enabled: true