Files
git.stella-ops.org/docs/07_HIGH_LEVEL_ARCHITECTURE.md
2025-10-18 20:46:16 +03:00

20 KiB
Executable File
Raw Blame History

Below is the revised, consolidated high_level_architecture.md. It absorbs all content from components.md so you have a single, authoritative file. No separate components doc is required.


HighLevel Architecture — StellaOps (Consolidated • 2025Q4)

Purpose. A complete, implementationready map of StellaOps: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic. Scope. This file replaces the separate components.md; all component details now live here.


0) Product vision & principles

Vision. StellaOps is a deterministic SBOM + VEX platform for CI/CD and runtime, tuned for speed (perlayer deltas), quiet output (usagescoped views), and verifiability (DSSE + Rekor v2). It is selfhostable, airgap capable, and commercially enforceable: only licensed installations can produce StellaOpsverified attestations.

Operating principles.

  • Scannerowned SBOMs. We generate our own BOMs; we do not warehouse thirdparty SBOM content (we can link to attested SBOMs).
  • Deterministic evidence. Facts come from package DBs, installed metadata, linkers, and verified attestations; no fuzzy guessing in the core.
  • Perlayer caching. Cache fragments by layer digest and compose image SBOMs via CycloneDX BOMLink / SPDX ExternalRef.
  • Inventory vs Usage. Always record the full inventory of what exists; separately present usage (entrypoint closure + loaded libs).
  • Backend decides. PASS/FAIL is produced by Policy + VEX + Advisories. The scanner reports facts.
  • Attest or it didnt happen. Every export is signed as intoto/DSSE and logged in Rekor v2.
  • Sovereignready. Cloud is used only for licensing and optional endorsement; everything else is firstparty and selfhostable.

1) Service topology & trust boundaries

1.1 Runtime inventory (firstparty)

Service / Tool Container image Core role Scale pattern
Scanner.WebService stellaops/scanner-web Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports. Stateless; N replicas behind LB.
Scanner.Worker stellaops/scanner-worker Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/MachO, EntryTrace); emits perlayer SBOMs and composes image SBOMs. Horizontal; queuedriven; sharded by layer digest.
Scanner.Sbomer.BuildXPlugin stellaops/sbom-indexer BuildKit generator for buildtime SBOMs as OCI referrers. CIside; ephemeral.
Scanner.Sbomer.DockerImage stellaops/scanner-cli CLIorchestrated scanner container for postbuild scans. Local/CI; ephemeral.
Concelier.WebService stellaops/concelier-web Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). HA via Mongo locks.
Excititor.WebService stellaops/excititor-web VEX ingest/normalize/consensus; conflict retention; exports. HA via Mongo locks.
Policy Engine (in scanner-web) YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usagegating); produces policy digest. Inprocess; cache per digest.
Signer stellaops/signer Hard gate: validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. Stateless; HPA by QPS.
Attestor stellaops/attestor Posts DSSE bundles to Rekor v2; verification endpoints. Stateless; HPA by QPS.
Authority stellaops/authority Onprem OIDC issuing shortlived OpToks with DPoP/mTLS sender constraint. HA behind LB.
Zastava (Runtime) stellaops/zastava Runtime inspector/enforcer (observer + optional Admission Webhook). DaemonSet + Webhook.
Web UI stellaops/ui Angular app for scans, diffs, policy, VEX, runtime, reports. Stateless.
StellaOps.Cli stellaops/cli CLI for init/scan/export/diff/policy/report/verify; Buildx helper. Local/CI.

1.2 Thirdparty (selfhosted)

  • Fulcio (Sigstore CA) — issues shortlived signing certs (keyless).
  • Rekor v2 (tilebacked transparency log).
  • MinIO — S3compatible object store with lifecycle & Object Lock.
  • MongoDB — catalog, advisories, VEX.
  • Queue — Redis Streams / NATS / RabbitMQ (pluggable).
  • OCI Registry — must support Referrers API (discover SBOMs/signatures).

1.3 Cloud licensing (StellaOps)

  • Licensing Service (www.stella-ops.org) — issues longlived License Tokens (LT); exchanges LT → ProofofEntitlement (PoE) bound to an installation key; revoke/introspect PoE; optional crosslog endorsement.

1.4 Diagram (control/data planes & trust)

flowchart LR
  subgraph Cloud["www.stella-ops.org (Cloud)"]
    LS[Licensing Service<br/>LT→PoE / revoke / introspect]
  end

  subgraph OnPrem["Customer Site (Self-hosted)"]
    Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
    SW[Scanner.WebService]
    WK[Scanner.Worker xN]
    FEED[Concelier]
    VEX[Excititor]
    POL[Policy Engine (in Scanner.Web)]
    SGN[Signer\n(entitlement + signing)]
    ATT[Attestor\n(Rekor v2 submit/verify)]
    UI[Web UI (Angular)]
    Z[Zastava\n(Runtime Inspector/Enforcer)]
    MIN[(MinIO S3)]
    MGO[(MongoDB)]
    QUE[(Queue/Streams)]
  end

  CLI[StellaOps.Cli / Buildx Plugin]
  REG[(OCI Registry with Referrers)]
  FUL[ Fulcio ]
  REK[ Rekor v2 (tiles) ]

  CLI -->|scan/build| SW
  SW -->|jobs| QUE
  QUE --> WK
  WK --> MIN
  SW --> MGO
  FEED --> MGO
  VEX --> MGO
  UI --> SW
  Z --> SW

  SGN <--> Auth
  SGN --> FUL
  SGN -->|mTLS| ATT
  ATT --> REK

  SGN <-->|verify referrers| REG

Trust boundaries. Only Signer can sign; only Attestor can write to Rekor v2. Scanner/UI never sign.


2) Licensing & tokens (installationready, theftresistant)

Twotoken model.

  • License Token (LT) — longlived JWT from Licensing Service; used once to enroll the installation; never used in hot path.
  • ProofofEntitlement (PoE) — bound to the installation key (mTLS client cert or DPoPbound JWT with cnf); mediumlived; renewable; revocable.
  • Operational token (OpTok) — 25min OIDC token from Authority, senderconstrained (DPoP or mTLS). Used to authenticate to Signer/Scanner.WebService.

Signer enforces both: PoE proves entitlement; OpTok proves “who is calling now”. It also independently verifies the scanner image digest is StellaOpssigned via Referrers + cosign before signing anything.

Enrollment sequence (LT → PoE).

@startuml
actor Operator
participant "Install Agent" as IA
participant "Licensing Service" as LS
Operator -> IA: Provide LT
IA -> IA: Generate K_inst
IA -> LS: /license/enroll {LT, pub(K_inst)}
LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
@enduml

3) Scanner subsystem (facts engine)

3.1 Analyzers (deterministic only)

  • OS packages: apk/dpkg/rpm (Linux); Windows MSI/SxS/GAC (M2).

  • Language (installed state):

    • Java (pom.properties / MANIFEST) → pkg:maven/...
    • Node (node_modules/*/package.json) → pkg:npm/...
    • Python (*.dist-info/METADATA) → pkg:pypi/...
    • Go (buildinfo) → pkg:golang/...
    • .NET (*.deps.json) → pkg:nuget/...
    • Rust: deterministic language markers (symbol mangling) and crates only when present; otherwise bin:{sha256}.
  • Native: ELF/PE/MachO imports, DT_NEEDED, RPATH/RUNPATH, symbol versions, PE version info.

  • EntryTrace: parse ENTRYPOINT/CMD; shell AST; resolve launchers (Java/Node/Python) to terminal program; record file:line chain.

3.2 Caching & composition

  • Layer cache: {layerDigest → SBOM fragment + analyzer meta}.

  • File CAS: {sha256(file) → parse result (ELF/JAR metadata/etc.)}.

  • Composition: build image SBOMs from fragments via BOMLink/ExternalRef; emit two views:

    • Inventory (complete filesystem inventory).
    • Usage (entrypoint closure + linked libs).
  • Transport: JSON and CycloneDX Protobuf (compact, fast to parse).

  • Index: BOMIndex sidecar with purl table + roaring bitmap + usedByEntrypoint flag for fast joins.

3.3 Diff (image → layer → package)

  • Added / Removed / Versionchanged changes, attributed to the layer that caused them.
  • Raw diffs preserved; backend view applies VEX + Policy.

3.4 Buildtime SBOMs (fast CI path)

  • Buildx generator runs analyzers during docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer, attaches SBOMs as OCI referrers.
  • Scanner.WebService can trust these (policyconfigurable) and skip rescan; DSSE + Rekor v2 can be done either at build time or postpush via Signer/Attestor.

4) Backend evaluation (decider)

4.1 Concelier (advisories)

  • Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in Mongo; exports deterministic JSON and Trivy DB.
  • Offline kit bundles for airgapped sites.

4.2 Excititor (VEX)

  • Ingests OpenVEX / CSAF VEX / CycloneDX VEX; normalizes claims; retains conflicts; computes consensus with provider trust weights and justification gates.

4.3 Policy Engine (YAML DSL)

  • Matchers: image/repo/env/purl/cve/vendor/source/path/layerDigest/usedByEntrypoint
  • Actions: ignore(until, justification), fail, warn, defer, requireVEX{vendors, justifications}, escalate {sev, KEV, EPSS}, license constraints.
  • Produces a policy digest (SHA256 of canonicalized policy).

4.4 PASS/FAIL flow

  1. SBOM (Inventory / Usage) → join with Concelier advisories.
  2. Apply Excititor consensus (statuses & justifications).
  3. Apply Policy; compute PASS/FAIL with waiver TTLs.
  4. Sign the final report (DSSE via Signer) and log to Rekor v2 via Attestor.

5) Runtime enforcement (Zastava)

  • Observer: inventories running containers, checks image signatures, SBOM presence (referrers), detects drift (entrypoint chain divergence), flags unapproved images.
  • Admission Webhook (optional): blocks policyfail pods (dryrun first).
  • Integration: posts runtime events to Scanner.WebService; can request delta scans on changed layers.

6) Storage & catalogs (MinIO/Mongo)

MinIO layout

s3://stellaops/
  layers/<sha256>/sbom.cdx.json.zst
  layers/<sha256>/sbom.spdx.json.zst
  images/<imgDigest>/inventory.cdx.pb
  images/<imgDigest>/usage.cdx.pb
  indexes/<imgDigest>/bom-index.bin
  attest/<artifactSha256>.dsse.json

Catalog (Mongo)

  • artifacts (type/format/sha/size/rekor/ttl/immutable/refCount/createdAt)
  • images, layers, links, lifecycleRules

Retention

  • MinIO ILM for coarse TTL; Scanner.WebService GC decrements refCount and deletes unreferenced metadata; Object Lock for immutable classes (auditable artifacts).

7) APIs (consolidated surface)

7.1 Scanner.WebService

POST /api/scans                         { imageRef|digest, force? } → { scanId }
GET  /api/scans/{id}                    → { status, digests, artifacts[] }
GET  /api/sboms/{imageDigest}           ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
GET  /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
POST /api/exports                       { imageDigest, format, view } → { artifactId, rekorUrl }
POST /api/reports                       { imageDigest, policyRevision? } → { reportId, rekorUrl }
GET  /api/catalog/artifacts/{id}        → { size, ttl, immutable, rekor, refs }
GET  /healthz | /readyz | /metrics

7.2 Signer (mTLS; hard gate)

POST /sign/dsse    # body: {subjectHash, imageDigest, predicate}; headers: OpTok (DPoP/mTLS) + PoE
GET  /verify/referrers?imageDigest=sha256:...  # is this image StellaOps-signed?

7.3 Attestor (mTLS)

POST /rekor/entries      # DSSE bundle → {uuid, index, proof, logURL}
GET  /rekor/entries/{uuid}

7.4 Authority (OIDC)

  • /.well-known/openid-configuration, /oauth/token (DPoP/mTLS), /oauth/introspect, /jwks

7.5 Licensing (cloud)

POST /license/enroll      { LT, pubKey }           → PoE + introspection endpoints
POST /license/revoke      { license_id }           → ok
POST /license/introspect  { poe }                  → { active, claims, exp }
POST /attest/endorse      { bundle }               → endorsement bundle (optional)

8) Security & verifiability

  • Senderconstrained tokens. All operational calls use DPoP (RFC9449) or mTLSbound tokens (RFC8705).
  • Entitlement. PoE is mandatory; revocation honored online.
  • Release integrity. Signer independently verifies scanner image digest via Referrers + cosign before signing.
  • Separation of duties. Scanner/UI cannot sign; only Signer can sign; only Attestor can write to Rekor v2.
  • Verifiers. Anyone can verify: DSSE signature → certificate chain to StellaOps Fulcio/KMS rootRekor v2 inclusion.
  • Community vs Authorized. Free/community runs throttled with no official attestations; authorized runs full speed and produce StellaOpsverified bundles.

DSSE predicate (SBOM/report)

{
  "predicateType": "https://stella-ops.org/attestations/sbom/1",
  "subject": [{ "name": "s3://stellaops/images/<digest>/inventory.cdx.pb", "digest": { "sha256": "<sha256>" } }],
  "predicate": {
    "image_digest": "<sha256:...>",
    "stellaops_version": "2.3.1 (2027.04)",
    "license_id": "LIC-9F2A...",
    "customer_id": "CUST-ACME",
    "plan": "pro",
    "policy_digest": "sha256:...",
    "views": ["inventory","usage"],
    "created": "2025-10-17T12:34:56Z"
  }
}

BOMIndex sidecar Binary header + purl table + roaring bitmaps; optional usedByEntrypoint flags for fast policy joins.


9) Scale, performance & quotas

  • Workers: horizontal; distributed lock per layer digest; global CAS in MinIO.

  • Queues: Redis Streams / NATS / RabbitMQ. HPA by queue depth, CPU, memory.

  • Registry throttling: perregistry concurrency budgets.

  • Targets:

    • Buildtime path P95 ≤35s on warmed bases.
    • Postbuild delta scan P95 ≤10s for 200MB images.
    • Policy + VEX evaluation ≤500ms for 5k components using BOMIndex.
  • Quotas: license plan enforces QPS/concurrency/size; Signer throttles and can deny DSSE.


10) DevOps & distribution

  • Releases: all firstparty images cosignsigned; labels embed org.stellaops.version and org.stellaops.release_date.

  • Channels:

    • Community (public registry): throttled, nonattesting.
    • Authorized (private registry): full speed, DSSE enabled.
  • Client update flow: containers selfverify signatures at boot; report version; Signer enforces valid_release_year / max_version from PoE before signing.

  • Compose skeleton:

services:
  authority:  { image: stellaops/authority }
  fulcio:     { image: sigstore/fulcio }
  rekor:      { image: sigstore/rekor-v2 }
  minio:      { image: minio/minio, command: server /data --console-address ":9001" }
  mongo:      { image: mongo:7 }
  signer:     { image: stellaops/signer, depends_on: [authority, fulcio] }
  attestor:   { image: stellaops/attestor, depends_on: [rekor, signer] }
  scanner-web:{ image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
  scanner-worker:
    image: stellaops/scanner-worker
    deploy: { replicas: 4 }
    depends_on: [scanner-web]
  concelier:    { image: stellaops/concelier-web, depends_on: [mongo] }
  excititor:      { image: stellaops/excititor-web, depends_on: [mongo] }
  ui:         { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor] }
  • Backups: Mongo dumps; MinIO versioned buckets & replication; Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation.

11) Observability & audit

  • Metrics: scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
  • Tracing: perstage spans; correlation IDs across Scanner→Signer→Attestor.
  • Audit logs: every signing records license_id, image_digest, policy_digest, and Rekor UUID.
  • Compliance: MinIO Object Lock for immutable artifacts; reproducible outputs via policy digest + SBOM digest in predicate.

12) Roadmap (anchored to this architecture)

  • M2: Windows MSI/SxS/GAC analyzers; deeper Rust (DWARF enrichers).
  • M2: Buildx generator certified flows; crossregistry trust policies.
  • M3: PatchPresence plugin (signaturebased backport detection), optin.
  • M3: Zastava Admission control GA with policy presets and dryrun→enforce stages.
  • Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.

13) Canonical sequences (verification & signing)

Sign & log (OpTok + PoE, image verify, DSSE, Rekor).

sequenceDiagram
  autonumber
  participant Scan as Scanner.WebService
  participant Auth as Authority (OIDC)
  participant Sign as Signer
  participant Reg as OCI Registry
  participant Ful as Fulcio/KMS
  participant Att as Attestor
  participant Rek as Rekor v2

  Scan->>Auth: Get OpTok (DPoP/mTLS)
  Scan->>Sign: sign(request) + OpTok + PoE + DPoP proof
  Sign->>Auth: Validate OpTok & sender-constraint
  Sign->>Sign: Validate PoE (introspect/revocation)
  Sign->>Reg: Verify scanner image is StellaOps-signed (Referrers + cosign)
  alt OK
    Sign->>Ful: Get signing cert (keyless) or use KMS key
    Sign-->>Scan: DSSE bundle (cert chain)
    Scan->>Att: Submit bundle
    Att-->>Rek: Create entry
    Rek-->>Att: {uuid,index,proof}
    Att-->>Scan: Rekor URL
  else Deny
    Sign-->>Scan: 403 (no attestation)
  end

Verification (third party).

@startuml
actor Verifier
participant "stellaops verify" as Tool
database "Fulcio/KMS root" as Root
participant "Rekor v2" as R2
Verifier -> Tool: bundle (URL/file)
Tool -> Tool: Verify DSSE signature
Tool -> Root: Verify cert chain to StellaOps root
Tool -> R2: Verify inclusion proof / query by UUID
Tool -> Verifier: OK + claims (license_id, policy_digest, version)
@enduml

End of high_level_architecture.md (Consolidated).