Files
git.stella-ops.org/docs/07_HIGH_LEVEL_ARCHITECTURE.md
Vladimir Moushkov 67d581d2e8
Some checks failed
Build Test Deploy / build-test (push) Has been cancelled
Build Test Deploy / authority-container (push) Has been cancelled
Build Test Deploy / docs (push) Has been cancelled
Build Test Deploy / deploy (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
up
2025-10-17 19:34:43 +03:00

431 lines
20 KiB
Markdown
Executable File
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Below is the **revised, consolidated** `high_level_architecture.md`.
It **absorbs** all content from `components.md` so you have a single, authoritative file. No separate components doc is required.
---
# HighLevel Architecture — **StellaOps** (Consolidated • 2025Q4)
> **Purpose.** A complete, implementationready map of StellaOps: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic.
> **Scope.** This file **replaces** the separate `components.md`; all component details now live here.
---
## 0) Product vision & principles
**Vision.** StellaOps is a **deterministic SBOM + VEX platform** for CI/CD and runtime, tuned for **speed** (perlayer deltas), **quiet output** (usagescoped views), and **verifiability** (DSSE + Rekor v2). It is **selfhostable**, **airgap capable**, and **commercially enforceable**: only licensed installations can produce **StellaOpsverified** attestations.
**Operating principles.**
* **Scannerowned SBOMs.** We generate our own BOMs; we do not warehouse thirdparty SBOM content (we can **link** to attested SBOMs).
* **Deterministic evidence.** Facts come from package DBs, installed metadata, linkers, and verified attestations; no fuzzy guessing in the core.
* **Perlayer caching.** Cache fragments by **layer digest** and compose image SBOMs via **CycloneDX BOMLink** / **SPDX ExternalRef**.
* **Inventory vs Usage.** Always record the full **inventory** of what exists; separately present **usage** (entrypoint closure + loaded libs).
* **Backend decides.** PASS/FAIL is produced by **Policy** + **VEX** + **Advisories**. The scanner reports facts.
* **Attest or it didnt happen.** Every export is signed as **intoto/DSSE** and logged in **Rekor v2**.
* **Sovereignready.** Cloud is used only for licensing and optional endorsement; everything else is firstparty and selfhostable.
---
## 1) Service topology & trust boundaries
### 1.1 Runtime inventory (firstparty)
| Service / Tool | Container image | Core role | Scale pattern |
| ------------------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports. | Stateless; N replicas behind LB. |
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/MachO, EntryTrace); emits perlayer SBOMs and composes image SBOMs. | Horizontal; queuedriven; sharded by layer digest. |
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for buildtime SBOMs as OCI **referrers**. | CIside; ephemeral. |
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLIorchestrated scanner container for postbuild scans. | Local/CI; ephemeral. |
| **Feedser.WebService** | `stellaops/feedser-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
| **Vexer.WebService** | `stellaops/vexer-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usagegating); produces **policy digest**. | Inprocess; cache per digest. |
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
| **Authority** | `stellaops/authority` | Onprem OIDC issuing **shortlived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, runtime, reports. | Stateless. |
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper. | Local/CI. |
### 1.2 Thirdparty (selfhosted)
* **Fulcio** (Sigstore CA) — issues shortlived signing certs (keyless).
* **Rekor v2** (tilebacked transparency log).
* **MinIO** — S3compatible object store with lifecycle & Object Lock.
* **MongoDB** — catalog, advisories, VEX.
* **Queue** — Redis Streams / NATS / RabbitMQ (pluggable).
* **OCI Registry** — must support **Referrers API** (discover SBOMs/signatures).
### 1.3 Cloud licensing (StellaOps)
* **Licensing Service** (`www.stella-ops.org`) — issues longlived **License Tokens (LT)**; exchanges LT → **ProofofEntitlement (PoE)** bound to an installation key; revoke/introspect PoE; optional crosslog **endorsement**.
### 1.4 Diagram (control/data planes & trust)
```mermaid
flowchart LR
subgraph Cloud["www.stella-ops.org (Cloud)"]
LS[Licensing Service<br/>LT→PoE / revoke / introspect]
end
subgraph OnPrem["Customer Site (Self-hosted)"]
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
SW[Scanner.WebService]
WK[Scanner.Worker xN]
FEED[Feedser]
VEX[Vexer]
POL[Policy Engine (in Scanner.Web)]
SGN[Signer\n(entitlement + signing)]
ATT[Attestor\n(Rekor v2 submit/verify)]
UI[Web UI (Angular)]
Z[Zastava\n(Runtime Inspector/Enforcer)]
MIN[(MinIO S3)]
MGO[(MongoDB)]
QUE[(Queue/Streams)]
end
CLI[StellaOps.Cli / Buildx Plugin]
REG[(OCI Registry with Referrers)]
FUL[ Fulcio ]
REK[ Rekor v2 (tiles) ]
CLI -->|scan/build| SW
SW -->|jobs| QUE
QUE --> WK
WK --> MIN
SW --> MGO
FEED --> MGO
VEX --> MGO
UI --> SW
Z --> SW
SGN <--> Auth
SGN --> FUL
SGN -->|mTLS| ATT
ATT --> REK
SGN <-->|verify referrers| REG
```
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI never sign.
---
## 2) Licensing & tokens (installationready, theftresistant)
**Twotoken model.**
* **License Token (LT)** — longlived JWT from **Licensing Service**; used **once** to enroll the installation; never used in hot path.
* **ProofofEntitlement (PoE)** — bound to the installation key (mTLS client cert **or** DPoPbound JWT with `cnf`); mediumlived; renewable; revocable.
* **Operational token (OpTok)** — 25min OIDC token from **Authority**, **senderconstrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**.
**Signer enforces both:** PoE proves entitlement; OpTok proves “who is calling now”. It also **independently verifies** the **scanner image digest** is **StellaOpssigned** via **Referrers + cosign** before signing anything.
**Enrollment sequence (LT → PoE).**
```plantuml
@startuml
actor Operator
participant "Install Agent" as IA
participant "Licensing Service" as LS
Operator -> IA: Provide LT
IA -> IA: Generate K_inst
IA -> LS: /license/enroll {LT, pub(K_inst)}
LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
@enduml
```
---
## 3) Scanner subsystem (facts engine)
### 3.1 Analyzers (deterministic only)
* **OS packages:** apk/dpkg/rpm (Linux); Windows MSI/SxS/GAC (M2).
* **Language (installed state):**
* Java (pom.properties / MANIFEST) → `pkg:maven/...`
* Node (`node_modules/*/package.json`) → `pkg:npm/...`
* Python (`*.dist-info/METADATA`) → `pkg:pypi/...`
* Go (buildinfo) → `pkg:golang/...`
* .NET (`*.deps.json`) → `pkg:nuget/...`
* **Rust:** deterministic **language markers** (symbol mangling) and crates only when present; otherwise `bin:{sha256}`.
* **Native:** ELF/PE/MachO imports, DT_NEEDED, RPATH/RUNPATH, symbol versions, PE version info.
* **EntryTrace:** parse `ENTRYPOINT`/`CMD`; shell AST; resolve launchers (Java/Node/Python) to terminal program; record file:line chain.
### 3.2 Caching & composition
* **Layer cache:** `{layerDigest → SBOM fragment + analyzer meta}`.
* **File CAS:** `{sha256(file) → parse result (ELF/JAR metadata/etc.)}`.
* **Composition:** build **image SBOMs** from fragments via **BOMLink/ExternalRef**; emit **two views**:
* **Inventory** (complete filesystem inventory).
* **Usage** (entrypoint closure + linked libs).
* **Transport:** JSON **and** **CycloneDX Protobuf** (compact, fast to parse).
* **Index:** BOMIndex sidecar with purl table + roaring bitmap + `usedByEntrypoint` flag for fast joins.
### 3.3 Diff (image → layer → package)
* Added / Removed / Versionchanged changes, **attributed** to the layer that caused them.
* Raw diffs preserved; backend view applies **VEX + Policy**.
### 3.4 Buildtime SBOMs (fast CI path)
* Buildx **generator** runs analyzers during `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, attaches SBOMs as **OCI referrers**.
* Scanner.WebService can trust these (policyconfigurable) and **skip** rescan; DSSE + Rekor v2 can be done either at build time or postpush via Signer/Attestor.
---
## 4) Backend evaluation (decider)
### 4.1 Feedser (advisories)
* Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in Mongo; exports **deterministic JSON** and **Trivy DB**.
* Offline kit bundles for airgapped sites.
### 4.2 Vexer (VEX)
* Ingests **OpenVEX / CSAF VEX / CycloneDX VEX**; normalizes claims; retains conflicts; computes **consensus** with provider trust weights and justification gates.
### 4.3 Policy Engine (YAML DSL)
* Matchers: `image/repo/env/purl/cve/vendor/source/path/layerDigest/usedByEntrypoint`
* Actions: `ignore(until, justification)`, `fail`, `warn`, `defer`, `requireVEX{vendors, justifications}`, `escalate {sev, KEV, EPSS}`, license constraints.
* Produces a **policy digest** (SHA256 of canonicalized policy).
### 4.4 PASS/FAIL flow
1. SBOM (Inventory / Usage) → join with **Feedser** advisories.
2. Apply **Vexer** consensus (statuses & justifications).
3. Apply **Policy**; compute PASS/FAIL with waiver TTLs.
4. Sign the **final report** (DSSE via **Signer**) and log to **Rekor v2** via **Attestor**.
---
## 5) Runtime enforcement (Zastava)
* **Observer:** inventories running containers, checks image signatures, SBOM presence (referrers), detects drift (entrypoint chain divergence), flags unapproved images.
* **Admission Webhook (optional):** blocks policyfail pods (dryrun first).
* **Integration:** posts runtime events to Scanner.WebService; can request **delta scans** on changed layers.
---
## 6) Storage & catalogs (MinIO/Mongo)
**MinIO layout**
```
s3://stellaops/
layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin
attest/<artifactSha256>.dsse.json
```
**Catalog (Mongo)**
* `artifacts` (type/format/sha/size/rekor/ttl/immutable/refCount/createdAt)
* `images`, `layers`, `links`, `lifecycleRules`
**Retention**
* MinIO **ILM** for coarse TTL; Scanner.WebService GC decrements `refCount` and deletes unreferenced metadata; **Object Lock** for immutable classes (auditable artifacts).
---
## 7) APIs (consolidated surface)
### 7.1 Scanner.WebService
```
POST /api/scans { imageRef|digest, force? } → { scanId }
GET /api/scans/{id} → { status, digests, artifacts[] }
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
POST /api/reports { imageDigest, policyRevision? } → { reportId, rekorUrl }
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
GET /healthz | /readyz | /metrics
```
### 7.2 Signer (mTLS; hard gate)
```
POST /sign/dsse # body: {subjectHash, imageDigest, predicate}; headers: OpTok (DPoP/mTLS) + PoE
GET /verify/referrers?imageDigest=sha256:... # is this image StellaOps-signed?
```
### 7.3 Attestor (mTLS)
```
POST /rekor/entries # DSSE bundle → {uuid, index, proof, logURL}
GET /rekor/entries/{uuid}
```
### 7.4 Authority (OIDC)
* `/.well-known/openid-configuration`, `/oauth/token` (DPoP/mTLS), `/oauth/introspect`, `/jwks`
### 7.5 Licensing (cloud)
```
POST /license/enroll { LT, pubKey } → PoE + introspection endpoints
POST /license/revoke { license_id } → ok
POST /license/introspect { poe } → { active, claims, exp }
POST /attest/endorse { bundle } → endorsement bundle (optional)
```
---
## 8) Security & verifiability
* **Senderconstrained tokens.** All operational calls use **DPoP** (RFC9449) or **mTLSbound** tokens (RFC8705).
* **Entitlement.** **PoE** is mandatory; revocation honored online.
* **Release integrity.** **Signer** independently verifies **scanner image digest** via **Referrers + cosign** before signing.
* **Separation of duties.** Scanner/UI cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
* **Verifiers.** Anyone can verify: DSSE signature → certificate chain to **StellaOps Fulcio/KMS root****Rekor v2** inclusion.
* **Community vs Authorized.** Free/community runs throttled with no official attestations; authorized runs full speed and produce **StellaOpsverified** bundles.
**DSSE predicate (SBOM/report)**
```json
{
"predicateType": "https://stella-ops.org/attestations/sbom/1",
"subject": [{ "name": "s3://stellaops/images/<digest>/inventory.cdx.pb", "digest": { "sha256": "<sha256>" } }],
"predicate": {
"image_digest": "<sha256:...>",
"stellaops_version": "2.3.1 (2027.04)",
"license_id": "LIC-9F2A...",
"customer_id": "CUST-ACME",
"plan": "pro",
"policy_digest": "sha256:...",
"views": ["inventory","usage"],
"created": "2025-10-17T12:34:56Z"
}
}
```
**BOMIndex sidecar**
Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags for fast policy joins.
---
## 9) Scale, performance & quotas
* **Workers:** horizontal; **distributed lock per layer digest**; global CAS in MinIO.
* **Queues:** Redis Streams / NATS / RabbitMQ. HPA by queue depth, CPU, memory.
* **Registry throttling:** perregistry concurrency budgets.
* **Targets:**
* Buildtime path P95 ≤35s on warmed bases.
* Postbuild delta scan P95 ≤10s for 200MB images.
* Policy + VEX evaluation ≤500ms for 5k components using BOMIndex.
* **Quotas:** license plan enforces QPS/concurrency/size; **Signer** throttles and can deny DSSE.
---
## 10) DevOps & distribution
* **Releases:** all firstparty images **cosignsigned**; labels embed `org.stellaops.version` and `org.stellaops.release_date`.
* **Channels:**
* **Community** (public registry): throttled, nonattesting.
* **Authorized** (private registry): full speed, DSSE enabled.
* **Client update flow:** containers selfverify signatures at boot; report version; **Signer** enforces `valid_release_year` / `max_version` from PoE before signing.
* **Compose skeleton:**
```yaml
services:
authority: { image: stellaops/authority }
fulcio: { image: sigstore/fulcio }
rekor: { image: sigstore/rekor-v2 }
minio: { image: minio/minio, command: server /data --console-address ":9001" }
mongo: { image: mongo:7 }
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
scanner-web:{ image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
scanner-worker:
image: stellaops/scanner-worker
deploy: { replicas: 4 }
depends_on: [scanner-web]
feedser: { image: stellaops/feedser-web, depends_on: [mongo] }
vexer: { image: stellaops/vexer-web, depends_on: [mongo] }
ui: { image: stellaops/ui, depends_on: [scanner-web, feedser, vexer] }
```
* **Backups:** Mongo dumps; MinIO versioned buckets & replication; Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation.
---
## 11) Observability & audit
* **Metrics:** scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
* **Tracing:** perstage spans; correlation IDs across Scanner→Signer→Attestor.
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID.
* **Compliance:** MinIO **Object Lock** for immutable artifacts; reproducible outputs via policy digest + SBOM digest in predicate.
---
## 12) Roadmap (anchored to this architecture)
* M2: Windows MSI/SxS/GAC analyzers; deeper Rust (DWARF enrichers).
* M2: Buildx generator certified flows; crossregistry trust policies.
* M3: PatchPresence plugin (signaturebased backport detection), optin.
* M3: Zastava Admission control GA with policy presets and dryrun→enforce stages.
* Continuous: Policy UX (waiver TTLs, vendor rules), Vexer connectors expansion.
---
## 13) Canonical sequences (verification & signing)
**Sign & log (OpTok + PoE, image verify, DSSE, Rekor).**
```mermaid
sequenceDiagram
autonumber
participant Scan as Scanner.WebService
participant Auth as Authority (OIDC)
participant Sign as Signer
participant Reg as OCI Registry
participant Ful as Fulcio/KMS
participant Att as Attestor
participant Rek as Rekor v2
Scan->>Auth: Get OpTok (DPoP/mTLS)
Scan->>Sign: sign(request) + OpTok + PoE + DPoP proof
Sign->>Auth: Validate OpTok & sender-constraint
Sign->>Sign: Validate PoE (introspect/revocation)
Sign->>Reg: Verify scanner image is StellaOps-signed (Referrers + cosign)
alt OK
Sign->>Ful: Get signing cert (keyless) or use KMS key
Sign-->>Scan: DSSE bundle (cert chain)
Scan->>Att: Submit bundle
Att-->>Rek: Create entry
Rek-->>Att: {uuid,index,proof}
Att-->>Scan: Rekor URL
else Deny
Sign-->>Scan: 403 (no attestation)
end
```
**Verification (third party).**
```plantuml
@startuml
actor Verifier
participant "stellaops verify" as Tool
database "Fulcio/KMS root" as Root
participant "Rekor v2" as R2
Verifier -> Tool: bundle (URL/file)
Tool -> Tool: Verify DSSE signature
Tool -> Root: Verify cert chain to StellaOps root
Tool -> R2: Verify inclusion proof / query by UUID
Tool -> Verifier: OK + claims (license_id, policy_digest, version)
@enduml
```
---
**End of `high_level_architecture.md` (Consolidated).**