414 lines
20 KiB
Markdown
414 lines
20 KiB
Markdown
# 7 · High‑Level Architecture — **Stella Ops**
|
||
|
||
---
|
||
|
||
## 0 Purpose & Scope
|
||
|
||
Give contributors, DevOps engineers and auditors a **complete yet readable map** of the Core:
|
||
|
||
* Major runtime components and message paths.
|
||
* Where plug‑ins, CLI helpers and runtime agents attach.
|
||
* Technology choices that enable the sub‑5 second SBOM goal.
|
||
* Typical operational scenarios (pipeline scan, mute, nightly re‑scan, etc.).
|
||
|
||
Anything enterprise‑only (signed PDF, Crypto‑specific TLS, LDAP, enforcement) **must arrive as a plug‑in**; the Core never hard‑codes those concerns.
|
||
|
||
---
|
||
## 1 Component Overview
|
||
|
||
| # | Component | Responsibility |
|
||
|---|-----------|---------------|
|
||
| 1 | **API Gateway** | REST endpoints (`/scan`, `/quota`, **`/token/offline`**); token auth; quota enforcement |
|
||
| 2 | **Scan Service** | SBOM parsing, Delta‑SBOM cache, vulnerability lookup |
|
||
| 3 | **Policy Engine** | YAML / (optional) Rego rule evaluation; verdict assembly |
|
||
| 4 | **Quota Service** | Per‑token counters; **333 scans/day**; waits & HTTP 429 |
|
||
| 5 | **Client‑JWT Issuer** | Issues 30‑day offline tokens; bundles them into OUK |
|
||
| 5 | **Registry** | Anonymous internal Docker registry for agents, SBOM uploads |
|
||
| 6 | **Web UI** | React/Blazor SPA; dashboards, policy editor, quota banner |
|
||
| 7 | **Data Stores** | **Redis** (cache, quota) & **MongoDB** (SBOMs, findings, audit) |
|
||
| 8 | **Plugin Host** | Hot‑load .NET DLLs; isolates community plug‑ins |
|
||
| 9 | **Agents** | `sbom‑builder`, `SanTech` scanner CLI, future `StellaOpsAttestor` |
|
||
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
subgraph "External Actors"
|
||
DEV["Developer / DevSecOps / Manager"]
|
||
CI["CI/CD Pipeline (e.g., SanTech CLI)"]
|
||
K8S["Kubernetes Cluster (e.g., Zastava Agent)"]
|
||
end
|
||
|
||
subgraph "Stella Ops Runtime"
|
||
subgraph "Core Services"
|
||
CORE["Stella Core<br>(REST + gRPC APIs, Orchestration)"]
|
||
REDIS[("Redis<br>(Cache, Queues, Trivy DB Mirror)")]
|
||
MONGO[("MongoDB<br>(Optional: Long-term Storage)")]
|
||
POL["Mute Policies<br>(OPA & YAML Evaluator)"]
|
||
REG["StellaOps Registry<br>(Docker Registry v2)"]
|
||
ATT["StellaOps Attestor<br>(SLSA + Rekor)"]
|
||
end
|
||
|
||
subgraph "Agents & Builders"
|
||
SB["SBOM Builder<br>(Go Binary: Extracts Layers, Generates SBOMs)"]
|
||
SA["SanTech Agent<br>(Pipeline Helper: Invokes Builder, Triggers Scans)"]
|
||
ZA["Zastava Agent<br>(K8s Webhook: Enforces Policies, Inventories Containers)"]
|
||
end
|
||
|
||
subgraph "Scanners & UI"
|
||
TRIVY["Trivy Scanner<br>(Plugin Container: Vulnerability Scanning)"]
|
||
UI["Web UI<br>(Vue3 + Tailwind: Dashboards, Policy Editor)"]
|
||
CLI["Stella CLI<br>(CLI Helper: Triggers Scans, Mutes)"]
|
||
end
|
||
end
|
||
|
||
DEV -->|Browses Findings, Mutes CVEs| UI
|
||
DEV -->|Triggers Scans| CLI
|
||
CI -->|Generates SBOM, Calls /scan| SA
|
||
K8S -->|Inventories Containers, Enforces Gates| ZA
|
||
|
||
UI -- "REST" --> CORE
|
||
CLI -- "REST/gRPC" --> CORE
|
||
SA -->|Scan Requests| CORE
|
||
SB -->|Uploads SBOMs| CORE
|
||
ZA -->|Policy Gates| CORE
|
||
|
||
CORE -- "Queues, Caches" --> REDIS
|
||
CORE -- "Persists Data" --> MONGO
|
||
CORE -->|Evaluates Policies| POL
|
||
CORE -->|Attests Provenance| ATT
|
||
CORE -->|Scans Vulnerabilities| TRIVY
|
||
|
||
SB -- "Pulls Images" --> REG
|
||
SA -- "Pulls Images" --> REG
|
||
ZA -- "Pulls Images" --> REG
|
||
|
||
style DEV fill:#f9f,stroke:#333
|
||
style CI fill:#f9f,stroke:#333
|
||
style K8S fill:#f9f,stroke:#333
|
||
style CORE fill:#ddf,stroke:#333
|
||
style REDIS fill:#fdd,stroke:#333
|
||
style MONGO fill:#fdd,stroke:#333
|
||
style POL fill:#dfd,stroke:#333
|
||
style REG fill:#dfd,stroke:#333
|
||
style ATT fill:#dfd,stroke:#333
|
||
style SB fill:#fdf,stroke:#333
|
||
style SA fill:#fdf,stroke:#333
|
||
style ZA fill:#fdf,stroke:#333
|
||
style TRIVY fill:#ffd,stroke:#333
|
||
style UI fill:#ffd,stroke:#333
|
||
style CLI fill:#ffd,stroke:#333
|
||
```
|
||
|
||
* **Developer / DevSecOps / Manager** – browses findings, mutes CVEs, triggers scans.
|
||
* **Santech CLI** – generates SBOMs and calls `/scan` during CI.
|
||
* **Zastava Agent** – inventories live containers; Core ships it in *passive* mode only (no kill).
|
||
|
||
### 1.1 Client‑JWT Lifecycle (offline aware)
|
||
|
||
1. **Online instance** – user signs in → `/connect/token` issues JWT valid 12 h.
|
||
2. **Offline instance** – JWT with `exp ≈ 30 days` ships in OUK; backend
|
||
**re‑signs** and stores it during import.
|
||
3. Tokens embed a `tier` claim (“Free”) and `maxScansPerDay: 333`.
|
||
4. On expiry the UI surfaces a red toast **7 days** in advance.
|
||
|
||
---
|
||
|
||
## 2 · Component Responsibilities (runtime view)
|
||
|
||
| Component | Core Responsibility | Implementation Highlights |
|
||
| -------------------------- | ---------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- |
|
||
| **Stella Core** | Orchestrates scans, persists SBOM blobs, serves REST/gRPC APIs, fans out jobs to scanners & policy engine. | .NET 8, CQRS, Redis Streams; pluggable runner interfaces. |
|
||
| **SBOM Builder** | Extracts image layers, queries Core for *missing* layers, generates SBOMs (multi‑format), uploads blobs. | Go binary; wraps Trivy & Syft libs. |
|
||
| **SanTech Agent** | Pipeline‑side helper; invokes Builder, triggers scan, streams progress back to CI/CD. | Static musl build. |
|
||
| **Zastava Agent** | K8s admission webhook enforcing policy verdicts before Pod creation. | Rust for sub‑10 ms latencies. |
|
||
| **UI** | Angular 17 SPA for dashboards, settings, policy editor. | Tailwind CSS; Webpack module federation (future). |
|
||
| **Redis** | Cache, queue, Trivy‑DB mirror, layer diffing. | Single instance or Sentinel. |
|
||
| **MongoDB** (opt.) | Long‑term SBOM & policy audit storage (> 180 days). | Optional; enabled via flag. |
|
||
| **StellaOps.Registry** | Anonymous read‑only Docker v2 registry with optional Cosign verification. | `registry :2` behind nginx reverse proxy. |
|
||
| **StellaOps.MutePolicies** | YAML/Rego evaluator, policy version store, `/policy/*` API. | Embeds OPA‑WASM; falls back to `opa exec`. |
|
||
| **StellaOpsAttestor** | Generate SLSA provenance & Rekor signatures; verify on demand. | Side‑car container; DSSE + Rekor CLI. |
|
||
|
||
All cross‑component calls use dependency‑injected interfaces—no
|
||
intra‑component reach‑ins.
|
||
|
||
---
|
||
|
||
## 3 · Principal Backend Modules & Plug‑in Hooks
|
||
|
||
| Namespace | Responsibility | Built‑in Tech / Default | Plug‑in Contract |
|
||
| --------------- | -------------------------------------------------- | ----------------------- | ------------------------------------------------- |
|
||
| `configuration` | Parse env/JSON, health‑check endpoint | .NET 9 Options | `IConfigValidator` |
|
||
| `identity` | Embedded OAuth2/OIDC (OpenIddict 6) | MIT OpenIddict | `IIdentityProvider` for LDAP/SAML/JWT gateway |
|
||
| `pluginloader` | Discover DLLs, SemVer gate, optional Cosign verify | Reflection + Cosign | `IPluginLifecycleHook` for telemetry |
|
||
| `scanning` | SBOM‑ & image‑flow orchestration; runner pool | Trivy CLI (default) | `IScannerRunner` – e.g., Grype, Copacetic, Clair |
|
||
| `feedmerger` | Nightly NVD merge & feed enrichment | Hangfire job | drop‑in `*.Schedule.dll` for OSV, GHSA, BDU feeds |
|
||
| `tls` | TLS provider abstraction | OpenSSL | `ITlsProvider` for GOST, SM‑series, custom suites |
|
||
| `reporting` | Render HTML/PDF reports | RazorLight | `IReportRenderer` |
|
||
| `ui` | Angular SPA & i18n | Angular 17 | new locales via `/locales/{lang}.json` |
|
||
| `scheduling` | Cron + retries | Hangfire | any recurrent job via `*.Schedule.dll` |
|
||
|
||
```mermaid
|
||
classDiagram
|
||
class configuration
|
||
class identity
|
||
class pluginloader
|
||
class scanning
|
||
class feedmerger
|
||
class tls
|
||
class reporting
|
||
class ui
|
||
class scheduling
|
||
|
||
class AllModules
|
||
|
||
configuration ..> identity : Uses
|
||
identity ..> pluginloader : Authenticates Plugins
|
||
pluginloader ..> scanning : Loads Scanner Runners
|
||
scanning ..> feedmerger : Triggers Feed Merges
|
||
tls ..> AllModules : Provides TLS Abstraction
|
||
reporting ..> ui : Renders Reports for UI
|
||
scheduling ..> feedmerger : Schedules Nightly Jobs
|
||
|
||
note for scanning "Pluggable: ISScannerRunner<br>e.g., Trivy, Grype"
|
||
note for feedmerger "Pluggable: *.Schedule.dll<br>e.g., OSV, GHSA Feeds"
|
||
note for identity "Pluggable: IIdentityProvider<br>e.g., LDAP, SAML"
|
||
note for reporting "Pluggable: IReportRenderer<br>e.g., Custom PDF"
|
||
```
|
||
|
||
**When remaining = 0:**
|
||
API returns `429 Too Many Requests`, `Retry‑After: <UTC‑midnight>` (sequence omitted for brevity).
|
||
|
||
---
|
||
|
||
## 4 · Data Flows
|
||
|
||
### 4.1 SBOM‑First (≤ 5 s P95)
|
||
|
||
Builder produces SBOM locally, so Core never touches the Docker
|
||
socket.
|
||
Trivy path hits ≤ 5 s on alpine:3.19 with warmed DB.
|
||
Image‑unpack fallback stays ≤ 10 s for 200 MB images.
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CI as CI/CD Pipeline (SanTech Agent)
|
||
participant SB as SBOM Builder
|
||
participant CORE as Stella Core
|
||
participant REDIS as Redis Queue
|
||
participant RUN as Scanner Runner (e.g., Trivy)
|
||
participant POL as Policy Evaluator
|
||
|
||
CI->>SB: Invoke SBOM Generation
|
||
SB->>CORE: Check Missing Layers (/layers/missing)
|
||
CORE->>REDIS: Query Layer Diff (SDIFF)
|
||
REDIS-->>CORE: Missing Layers List
|
||
CORE-->>SB: Return Missing Layers
|
||
SB->>SB: Generate Delta SBOM
|
||
SB->>CORE: Upload SBOM Blob (POST /scan(sbom))
|
||
CORE->>REDIS: Enqueue Scan Job
|
||
REDIS->>RUN: Fan Out to Runner
|
||
RUN->>RUN: Perform Vulnerability Scan
|
||
RUN-->>CORE: Return Scan Results
|
||
CORE->>POL: Evaluate Mute Policies
|
||
POL-->>CORE: Policy Verdict
|
||
CORE-->>CI: JSON Verdict & Progress Stream
|
||
Note over CORE,CI: Achieves ≤5s P95 with Warmed DB
|
||
```
|
||
|
||
### 4.2 Delta SBOM
|
||
|
||
Builder collects layer digests.
|
||
`POST /layers/missing` → Redis SDIFF → missing layer list (< 20 ms).
|
||
SBOM generated only for those layers and uploaded.
|
||
|
||
### 4.3 Feed Enrichment
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CRON as Nightly Cron (Hangfire)
|
||
participant FM as Feed Merger
|
||
participant NVD as NVD Feed
|
||
participant OSV as OSV Plugin (Optional)
|
||
participant GHSA as GHSA Plugin (Optional)
|
||
participant BDU as BDU Plugin (Optional)
|
||
participant REDIS as Redis (Merged Feed Storage)
|
||
participant UI as Web UI
|
||
|
||
CRON->>FM: Trigger at 00:59
|
||
FM->>NVD: Fetch & Merge NVD Data
|
||
alt Optional Plugins
|
||
FM->>OSV: Merge OSV Feed
|
||
FM->>GHSA: Merge GHSA Feed
|
||
FM->>BDU: Merge BDU Feed
|
||
end
|
||
FM->>REDIS: Persist Merged Feed
|
||
REDIS-->>UI: Update Feed Freshness
|
||
UI->>UI: Display Green 'Feed Age' Tile
|
||
```
|
||
|
||
### 4.4 Identity & Auth Flow
|
||
|
||
OpenIddict issues JWTs via client‑credentials or password grant.
|
||
An IIdentityProvider plug‑in can delegate to LDAP, SAML or external OIDC
|
||
without Core changes.
|
||
---
|
||
## 5 · Runtime Helpers
|
||
|
||
| Helper | Form | Purpose | Extensible Bits |
|
||
|-----------|---------------------------------------|--------------------------------------------------------------------|-------------------------------------------|
|
||
| **SanTech** | Distroless CLI | Generates SBOM, calls `/scan`, honours threshold flag | `--engine`, `--pdf-out` piped to plug‑ins |
|
||
| **Zastava** | Static Go binary / DaemonSet | Watches Docker/CRI‑O events; uploads SBOMs; can enforce gate | Policy plug‑in could alter thresholds |
|
||
|
||
---
|
||
|
||
## 6 · Persistence & Cache Strategy
|
||
|
||
| Store | Primary Use | Why chosen |
|
||
|----------------|-----------------------------------------------|--------------------------------|
|
||
| **Redis 7** | Queue, SBOM cache, Trivy DB mirror | Sub‑1 ms P99 latency |
|
||
| **MongoDB** | History > 180 d, audit logs, policy versions | Optional; document‑oriented |
|
||
| **Local tmpfs**| Trivy layer cache (`/var/cache/trivy`) | Keeps disk I/O off hot path |
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph "Persistence Layers"
|
||
REDIS[(Redis: Fast Cache/Queues<br>Sub-1ms P99)]
|
||
MONGO[(MongoDB: Optional Audit/History<br>>180 Days)]
|
||
TMPFS[(Local tmpfs: Trivy Layer Cache<br>Low I/O Overhead)]
|
||
end
|
||
|
||
CORE["Stella Core"] -- Queues & SBOM Cache --> REDIS
|
||
CORE -- Long-term Storage --> MONGO
|
||
TRIVY["Trivy Scanner"] -- Layer Unpack Cache --> TMPFS
|
||
|
||
style REDIS fill:#fdd,stroke:#333
|
||
style MONGO fill:#dfd,stroke:#333
|
||
style TMPFS fill:#ffd,stroke:#333
|
||
```
|
||
|
||
---
|
||
|
||
## 7 · Typical Scenarios
|
||
|
||
| # | Flow | Steps |
|
||
|---------|----------------------------|-------------------------------------------------------------------------------------------------|
|
||
| **S‑1** | Pipeline Scan & Alert | SanTech → SBOM → `/scan` → policy verdict → CI exit code & link to *Scan Detail* |
|
||
| **S‑2** | Mute Noisy CVE | Dev toggles **Mute** in UI → rule stored in Redis → next build passes |
|
||
| **S‑3** | Nightly Re‑scan | `SbomNightly.Schedule` re‑queues SBOMs (mask‑filter) → dashboard highlights new Criticals |
|
||
| **S‑4** | Feed Update Cycle | `FeedMerger` merges feeds → UI *Feed Age* tile turns green |
|
||
| **S‑5** | Custom Report Generation | Plug‑in registers `IReportRenderer` → `/report/custom/{digest}` → CI downloads artifact |
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant DEV as Developer
|
||
participant UI as Web UI
|
||
participant CORE as Stella Core
|
||
participant REDIS as Redis
|
||
participant RUN as Scanner Runner
|
||
|
||
DEV->>UI: Toggle Mute for CVE
|
||
UI->>CORE: Update Mute Rule (POST /policy/mute)
|
||
CORE->>REDIS: Store Mute Policy
|
||
Note over CORE,REDIS: YAML/Rego Evaluator Updates
|
||
|
||
alt Next Pipeline Build
|
||
CI->>CORE: Trigger Scan (POST /scan)
|
||
CORE->>RUN: Enqueue & Scan
|
||
RUN-->>CORE: Raw Findings
|
||
CORE->>REDIS: Apply Mute Policies
|
||
REDIS-->>CORE: Filtered Verdict (Passes)
|
||
CORE-->>CI: Success Exit Code
|
||
end
|
||
```
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CRON as SbomNightly.Schedule
|
||
participant CORE as Stella Core
|
||
participant REDIS as Redis Queue
|
||
participant RUN as Scanner Runner
|
||
participant UI as Dashboard
|
||
|
||
CRON->>CORE: Re-queue SBOMs (Mask-Filter)
|
||
CORE->>REDIS: Enqueue Filtered Jobs
|
||
REDIS->>RUN: Fan Out to Runners
|
||
RUN-->>CORE: New Scan Results
|
||
CORE->>UI: Highlight New Criticals
|
||
Note over CORE,UI: Focus on Changes Since Last Scan
|
||
```
|
||
---
|
||
|
||
## 8 · UI Fast Facts
|
||
|
||
* **Stack** – Angular 17 + Vite dev server; Tailwind CSS.
|
||
* **State** – Signals + RxJS for live scan progress.
|
||
* **i18n / l10n** – JSON bundles served from `/locales/{lang}.json`.
|
||
* **Module Structure** – Lazy‑loaded feature modules (`dashboard`, `scans`, `settings`); runtime route injection by UI plug‑ins (road‑map Q2‑2026).
|
||
|
||
---
|
||
|
||
## 9 · Cross‑Cutting Concerns
|
||
|
||
* **Security** – containers run non‑root, `CAP_DROP:ALL`, read‑only FS, hardened seccomp profiles.
|
||
* **Observability** – Serilog JSON, OpenTelemetry OTLP exporter, Prometheus `/metrics`.
|
||
* **Upgrade Policy** – `/api/v1` endpoints & CLI flags stable across a minor; breaking changes bump major.
|
||
|
||
---
|
||
|
||
## 10 · Performance & Scalability
|
||
|
||
| Scenario | P95 target | Bottleneck | Mitigation |
|
||
|-----------------|-----------:|-----------------|-------------------------------------------------|
|
||
| SBOM‑first | ≤ 5 s | Redis queue | More CPU, increase `ScannerPool.Workers` |
|
||
| Image‑unpack | ≤ 10 s | Layer unpack | Prefer SBOM path, warm Docker cache |
|
||
| High concurrency| 40 rps | Runner CPU | Scale Core replicas + side‑car scanner services |
|
||
|
||
---
|
||
|
||
## 11 · Future Architectural Anchors
|
||
|
||
* **ScanService micro‑split (gRPC)** – isolate heavy runners for large clusters.
|
||
* **UI route plug‑ins** – dynamic Angular module loader (road‑map Q2‑2026).
|
||
* **Redis Cluster** – transparently sharded cache once sustained > 100 rps.
|
||
|
||
---
|
||
|
||
## 12 · Assumptions & Trade‑offs
|
||
|
||
Requires Docker/CRI‑O runtime; .NET 9 available on hosts; Windows containers are out‑of‑scope this cycle.
|
||
Embedded auth simplifies deployment but may need plug‑ins for enterprise IdPs.
|
||
Speed is prioritised over exhaustive feature parity with heavyweight commercial scanners.
|
||
|
||
---
|
||
|
||
## 13 · References & Further Reading
|
||
|
||
* **C4 Model** – <https://c4model.com>
|
||
* **.NET Architecture Guides** – <https://learn.microsoft.com/dotnet/architecture>
|
||
* **OSS Examples** – Kubernetes Architecture docs, Prometheus design papers, Backstage.
|
||
|
||
---
|
||
|
||
## 14 · Change Log
|
||
*(Git history is authoritative; table for reader convenience.)*
|
||
|
||
| Date | Note |
|
||
|------------|--------------------------------------------------------------------------------------------------------|
|
||
| 2025‑07‑13 | Added internal registry, multi‑format SBOM, delta flow, Policy as Code, Attestor integration sections. |
|
||
| 2025‑07‑12 | Re‑organised doc around C4; added diagrams, trade‑offs, security notes. |
|
||
| 2025‑07‑11 | Initial open‑sourcing pass – removed commercial references, added plug‑in hooks and UI details. |
|
||
|
||
---
|
||
|
||
## 14 Change Log
|
||
|
||
| Version | Date | Notes |
|
||
| ------- | ---------- | ------------------------------------------------------------------------------------------------ |
|
||
| v2.4 | 2025‑07‑12 | New modules |
|
||
| v2.3 | 2025‑07‑12 | Adopted C4 structure, added diagrams/trade-offs/security notes/contribution hooks/references. |
|
||
| v2.2 | 2025‑07‑11 | Removed last commercial refs; added TLS/IdP/report plug‑in hooks; deeper UI & scenario sections; |
|
||
| v2.1 | 2025‑07‑11 | Added scenarios, UI details, plug‑in depth. |
|
||
| v2.0 | 2025‑07‑11 | Full rewrite. |
|
||
|
||
*(End of High‑Level Architecture v2.2)*
|