Files
git.stella-ops.org/docs/07_HIGH_LEVEL_ARCHITECTURE.md

414 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#7 · HighLevel Architecture — **StellaOps**
---
##0Purpose &Scope
Give contributors, DevOps engineers and auditors a **complete yet readable map** of the Core:
* Major runtime components and message paths.
* Where plugins, CLI helpers and runtime agents attach.
* Technology choices that enable the sub5second SBOM goal.
* Typical operational scenarios (pipeline scan, mute, nightly rescan, etc.).
Anything enterpriseonly (signed PDF, Cryptospecific TLS, LDAP, enforcement) **must arrive as a plugin**; the Core never hardcodes those concerns.
---
##1Component Overview
| # | Component | Responsibility |
|---|-----------|---------------|
| 1 | **API Gateway** | REST endpoints (`/scan`, `/quota`, **`/token/offline`**); token auth; quota enforcement |
| 2 | **Scan Service** | SBOM parsing, DeltaSBOM cache, vulnerability lookup |
| 3 | **Policy Engine** | YAML / (optional) Rego rule evaluation; verdict assembly |
| 4 | **Quota Service** | Pertoken counters; **333 scans/day**; waits & HTTP 429 |
| 5 | **ClientJWT Issuer** | Issues 30day offline tokens; bundles them into OUK |
| 5 | **Registry** | Anonymous internal Docker registry for agents, SBOM uploads |
| 6 | **Web UI** | React/Blazor SPA; dashboards, policy editor, quota banner |
| 7 | **Data Stores** | **Redis** (cache, quota) & **MongoDB** (SBOMs, findings, audit) |
| 8 | **Plugin Host** | Hotload .NET DLLs; isolates community plugins |
| 9 | **Agents** | `sbombuilder`, `SanTech` scanner CLI, future `StellaOpsAttestor` |
```mermaid
flowchart TD
subgraph "External Actors"
DEV["Developer / DevSecOps / Manager"]
CI["CI/CD Pipeline (e.g., SanTech CLI)"]
K8S["Kubernetes Cluster (e.g., Zastava Agent)"]
end
subgraph "Stella Ops Runtime"
subgraph "Core Services"
CORE["Stella Core<br>(REST + gRPC APIs, Orchestration)"]
REDIS[("Redis<br>(Cache, Queues, Trivy DB Mirror)")]
MONGO[("MongoDB<br>(Optional: Long-term Storage)")]
POL["Mute Policies<br>(OPA & YAML Evaluator)"]
REG["StellaOps Registry<br>(Docker Registry v2)"]
ATT["StellaOps Attestor<br>(SLSA + Rekor)"]
end
subgraph "Agents & Builders"
SB["SBOM Builder<br>(Go Binary: Extracts Layers, Generates SBOMs)"]
SA["SanTech Agent<br>(Pipeline Helper: Invokes Builder, Triggers Scans)"]
ZA["Zastava Agent<br>(K8s Webhook: Enforces Policies, Inventories Containers)"]
end
subgraph "Scanners & UI"
TRIVY["Trivy Scanner<br>(Plugin Container: Vulnerability Scanning)"]
UI["Web UI<br>(Vue3 + Tailwind: Dashboards, Policy Editor)"]
CLI["Stella CLI<br>(CLI Helper: Triggers Scans, Mutes)"]
end
end
DEV -->|Browses Findings, Mutes CVEs| UI
DEV -->|Triggers Scans| CLI
CI -->|Generates SBOM, Calls /scan| SA
K8S -->|Inventories Containers, Enforces Gates| ZA
UI -- "REST" --> CORE
CLI -- "REST/gRPC" --> CORE
SA -->|Scan Requests| CORE
SB -->|Uploads SBOMs| CORE
ZA -->|Policy Gates| CORE
CORE -- "Queues, Caches" --> REDIS
CORE -- "Persists Data" --> MONGO
CORE -->|Evaluates Policies| POL
CORE -->|Attests Provenance| ATT
CORE -->|Scans Vulnerabilities| TRIVY
SB -- "Pulls Images" --> REG
SA -- "Pulls Images" --> REG
ZA -- "Pulls Images" --> REG
style DEV fill:#f9f,stroke:#333
style CI fill:#f9f,stroke:#333
style K8S fill:#f9f,stroke:#333
style CORE fill:#ddf,stroke:#333
style REDIS fill:#fdd,stroke:#333
style MONGO fill:#fdd,stroke:#333
style POL fill:#dfd,stroke:#333
style REG fill:#dfd,stroke:#333
style ATT fill:#dfd,stroke:#333
style SB fill:#fdf,stroke:#333
style SA fill:#fdf,stroke:#333
style ZA fill:#fdf,stroke:#333
style TRIVY fill:#ffd,stroke:#333
style UI fill:#ffd,stroke:#333
style CLI fill:#ffd,stroke:#333
```
* **Developer / DevSecOps / Manager** browses findings, mutes CVEs, triggers scans.
* **Santech CLI** generates SBOMs and calls `/scan` during CI.
* **Zastava Agent** inventories live containers; Core ships it in *passive* mode only (no kill).
###1.1ClientJWT Lifecycle (offline aware)
1. **Online instance** user signs in → `/connect/token` issues JWT valid 12h.
2. **Offline instance** JWT with `exp 30days` ships in OUK; backend
**resigns** and stores it during import.
3. Tokens embed a `tier` claim (“Free”) and `maxScansPerDay: 333`.
4. On expiry the UI surfaces a red toast **7days** in advance.
---
##2·Component Responsibilities (runtime view)
| Component | Core Responsibility | Implementation Highlights |
| -------------------------- | ---------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- |
| **Stella Core** | Orchestrates scans, persists SBOM blobs, serves REST/gRPC APIs, fans out jobs to scanners & policy engine. | .NET8, CQRS, Redis Streams; pluggable runner interfaces. |
| **SBOM Builder** | Extracts image layers, queries Core for *missing* layers, generates SBOMs (multiformat), uploads blobs. | Go binary; wraps Trivy & Syft libs. |
| **SanTech Agent** | Pipelineside helper; invokes Builder, triggers scan, streams progress back to CI/CD. | Static musl build. |
| **Zastava Agent** | K8s admission webhook enforcing policy verdicts before Pod creation. | Rust for sub10ms latencies. |
| **UI** | Angular17 SPA for dashboards, settings, policy editor. | Tailwind CSS; Webpack module federation (future). |
| **Redis** | Cache, queue, TrivyDB mirror, layer diffing. | Single instance or Sentinel. |
| **MongoDB** (opt.) | Longterm SBOM & policy audit storage (>180days). | Optional; enabled via flag. |
| **StellaOps.Registry** | Anonymous readonly Docker v2 registry with optional Cosign verification. | `registry :2` behind nginx reverse proxy. |
| **StellaOps.MutePolicies** | YAML/Rego evaluator, policy version store, `/policy/*` API. | Embeds OPAWASM; falls back to `opa exec`. |
| **StellaOpsAttestor** | Generate SLSA provenance & Rekor signatures; verify on demand. | Sidecar container; DSSE + Rekor CLI. |
All crosscomponent calls use dependencyinjected interfaces—no
intracomponent reachins.
---
##3·Principal Backend Modules & Plugin Hooks
| Namespace | Responsibility | Builtin Tech / Default | Plugin Contract |
| --------------- | -------------------------------------------------- | ----------------------- | ------------------------------------------------- |
| `configuration` | Parse env/JSON, healthcheck endpoint | .NET9 Options | `IConfigValidator` |
| `identity` | Embedded OAuth2/OIDC (OpenIddict 6) | MIT OpenIddict | `IIdentityProvider` for LDAP/SAML/JWT gateway |
| `pluginloader` | Discover DLLs, SemVer gate, optional Cosign verify | Reflection + Cosign | `IPluginLifecycleHook` for telemetry |
| `scanning` | SBOM & imageflow orchestration; runner pool | Trivy CLI (default) | `IScannerRunner` e.g., Grype, Copacetic, Clair |
| `feedmerger` | Nightly NVD merge & feed enrichment | Hangfire job | dropin `*.Schedule.dll` for OSV, GHSA, BDU feeds |
| `tls` | TLS provider abstraction | OpenSSL | `ITlsProvider` for GOST, SMseries, custom suites |
| `reporting` | Render HTML/PDF reports | RazorLight | `IReportRenderer` |
| `ui` | Angular SPA & i18n | Angular 17 | new locales via `/locales/{lang}.json` |
| `scheduling` | Cron + retries | Hangfire | any recurrent job via `*.Schedule.dll` |
```mermaid
classDiagram
class configuration
class identity
class pluginloader
class scanning
class feedmerger
class tls
class reporting
class ui
class scheduling
class AllModules
configuration ..> identity : Uses
identity ..> pluginloader : Authenticates Plugins
pluginloader ..> scanning : Loads Scanner Runners
scanning ..> feedmerger : Triggers Feed Merges
tls ..> AllModules : Provides TLS Abstraction
reporting ..> ui : Renders Reports for UI
scheduling ..> feedmerger : Schedules Nightly Jobs
note for scanning "Pluggable: ISScannerRunner<br>e.g., Trivy, Grype"
note for feedmerger "Pluggable: *.Schedule.dll<br>e.g., OSV, GHSA Feeds"
note for identity "Pluggable: IIdentityProvider<br>e.g., LDAP, SAML"
note for reporting "Pluggable: IReportRenderer<br>e.g., Custom PDF"
```
**When remaining =0:**
API returns `429 Too Many Requests`, `RetryAfter: <UTCmidnight>` (sequence omitted for brevity).
---
##4·Data Flows
###4.1SBOMFirst (≤5s P95)
Builder produces SBOM locally, so Core never touches the Docker
socket.
Trivy path hits ≤5s on alpine:3.19 with warmed DB.
Imageunpack fallback stays ≤10s for 200MB images.
```mermaid
sequenceDiagram
participant CI as CI/CD Pipeline (SanTech Agent)
participant SB as SBOM Builder
participant CORE as Stella Core
participant REDIS as Redis Queue
participant RUN as Scanner Runner (e.g., Trivy)
participant POL as Policy Evaluator
CI->>SB: Invoke SBOM Generation
SB->>CORE: Check Missing Layers (/layers/missing)
CORE->>REDIS: Query Layer Diff (SDIFF)
REDIS-->>CORE: Missing Layers List
CORE-->>SB: Return Missing Layers
SB->>SB: Generate Delta SBOM
SB->>CORE: Upload SBOM Blob (POST /scan(sbom))
CORE->>REDIS: Enqueue Scan Job
REDIS->>RUN: Fan Out to Runner
RUN->>RUN: Perform Vulnerability Scan
RUN-->>CORE: Return Scan Results
CORE->>POL: Evaluate Mute Policies
POL-->>CORE: Policy Verdict
CORE-->>CI: JSON Verdict & Progress Stream
Note over CORE,CI: Achieves ≤5s P95 with Warmed DB
```
###4.2Delta SBOM
Builder collects layer digests.
`POST /layers/missing` → Redis SDIFF → missing layer list (<20ms).
SBOM generated only for those layers and uploaded.
###4.3Feed Enrichment
```mermaid
sequenceDiagram
participant CRON as Nightly Cron (Hangfire)
participant FM as Feed Merger
participant NVD as NVD Feed
participant OSV as OSV Plugin (Optional)
participant GHSA as GHSA Plugin (Optional)
participant BDU as BDU Plugin (Optional)
participant REDIS as Redis (Merged Feed Storage)
participant UI as Web UI
CRON->>FM: Trigger at 00:59
FM->>NVD: Fetch & Merge NVD Data
alt Optional Plugins
FM->>OSV: Merge OSV Feed
FM->>GHSA: Merge GHSA Feed
FM->>BDU: Merge BDU Feed
end
FM->>REDIS: Persist Merged Feed
REDIS-->>UI: Update Feed Freshness
UI->>UI: Display Green 'Feed Age' Tile
```
###4.4Identity & Auth Flow
OpenIddict issues JWTs via clientcredentials or password grant.
An IIdentityProvider plugin can delegate to LDAP, SAML or external OIDC
without Core changes.
---
##5·Runtime Helpers
| Helper | Form | Purpose | Extensible Bits |
|-----------|---------------------------------------|--------------------------------------------------------------------|-------------------------------------------|
| **SanTech** | Distroless CLI | Generates SBOM, calls `/scan`, honours threshold flag | `--engine`, `--pdf-out` piped to plugins |
| **Zastava** | Static Go binary / DaemonSet | Watches Docker/CRIO events; uploads SBOMs; can enforce gate | Policy plugin could alter thresholds |
---
##6·Persistence & Cache Strategy
| Store | Primary Use | Why chosen |
|----------------|-----------------------------------------------|--------------------------------|
| **Redis7** | Queue, SBOM cache, Trivy DB mirror | Sub1ms P99 latency |
| **MongoDB** | History>180d, audit logs, policy versions | Optional; documentoriented |
| **Local tmpfs**| Trivy layer cache (`/var/cache/trivy`) | Keeps disk I/O off hot path |
```mermaid
flowchart LR
subgraph "Persistence Layers"
REDIS[(Redis: Fast Cache/Queues<br>Sub-1ms P99)]
MONGO[(MongoDB: Optional Audit/History<br>>180 Days)]
TMPFS[(Local tmpfs: Trivy Layer Cache<br>Low I/O Overhead)]
end
CORE["Stella Core"] -- Queues & SBOM Cache --> REDIS
CORE -- Long-term Storage --> MONGO
TRIVY["Trivy Scanner"] -- Layer Unpack Cache --> TMPFS
style REDIS fill:#fdd,stroke:#333
style MONGO fill:#dfd,stroke:#333
style TMPFS fill:#ffd,stroke:#333
```
---
##7·Typical Scenarios
| # | Flow | Steps |
|---------|----------------------------|-------------------------------------------------------------------------------------------------|
| **S1** | Pipeline Scan & Alert | SanTech → SBOM → `/scan` → policy verdict → CI exit code & link to *Scan Detail* |
| **S2** | Mute Noisy CVE | Dev toggles **Mute** in UI → rule stored in Redis → next build passes |
| **S3** | Nightly Rescan | `SbomNightly.Schedule` requeues SBOMs (maskfilter) → dashboard highlights new Criticals |
| **S4** | Feed Update Cycle | `FeedMerger` merges feeds → UI *Feed Age* tile turns green |
| **S5** | Custom Report Generation | Plugin registers `IReportRenderer``/report/custom/{digest}` → CI downloads artifact |
```mermaid
sequenceDiagram
participant DEV as Developer
participant UI as Web UI
participant CORE as Stella Core
participant REDIS as Redis
participant RUN as Scanner Runner
DEV->>UI: Toggle Mute for CVE
UI->>CORE: Update Mute Rule (POST /policy/mute)
CORE->>REDIS: Store Mute Policy
Note over CORE,REDIS: YAML/Rego Evaluator Updates
alt Next Pipeline Build
CI->>CORE: Trigger Scan (POST /scan)
CORE->>RUN: Enqueue & Scan
RUN-->>CORE: Raw Findings
CORE->>REDIS: Apply Mute Policies
REDIS-->>CORE: Filtered Verdict (Passes)
CORE-->>CI: Success Exit Code
end
```
```mermaid
sequenceDiagram
participant CRON as SbomNightly.Schedule
participant CORE as Stella Core
participant REDIS as Redis Queue
participant RUN as Scanner Runner
participant UI as Dashboard
CRON->>CORE: Re-queue SBOMs (Mask-Filter)
CORE->>REDIS: Enqueue Filtered Jobs
REDIS->>RUN: Fan Out to Runners
RUN-->>CORE: New Scan Results
CORE->>UI: Highlight New Criticals
Note over CORE,UI: Focus on Changes Since Last Scan
```
---
##8·UIFastFacts
* **Stack** Angular17 + Vite dev server; Tailwind CSS.
* **State** Signals + RxJS for live scan progress.
* **i18n / l10n** JSON bundles served from `/locales/{lang}.json`.
* **ModuleStructure** Lazyloaded feature modules (`dashboard`, `scans`, `settings`); runtime route injection by UI plugins (roadmap Q22026).
---
##9·CrossCutting Concerns
* **Security** containers run nonroot, `CAP_DROP:ALL`, readonly FS, hardened seccomp profiles.
* **Observability** Serilog JSON, OpenTelemetry OTLP exporter, Prometheus `/metrics`.
* **Upgrade Policy** `/api/v1` endpoints & CLI flags stable across a minor; breaking changes bump major.
---
##10·Performance & Scalability
| Scenario | P95 target | Bottleneck | Mitigation |
|-----------------|-----------:|-----------------|-------------------------------------------------|
| SBOMfirst | ≤5s | Redis queue | More CPU, increase `ScannerPool.Workers` |
| Imageunpack | ≤10s | Layer unpack | Prefer SBOM path, warm Docker cache |
| High concurrency| 40rps | Runner CPU | Scale Core replicas + sidecar scanner services |
---
##11·Future Architectural Anchors
* **ScanService microsplit (gRPC)** isolate heavy runners for large clusters.
* **UI route plugins** dynamic Angular module loader (roadmap Q22026).
* **Redis Cluster** transparently sharded cache once sustained>100rps.
---
##12·Assumptions & Tradeoffs
Requires Docker/CRIO runtime; .NET9 available on hosts; Windows containers are outofscope this cycle.
Embedded auth simplifies deployment but may need plugins for enterprise IdPs.
Speed is prioritised over exhaustive feature parity with heavyweight commercial scanners.
---
##13·References & Further Reading
* **C4 Model** <https://c4model.com>
* **.NET Architecture Guides** <https://learn.microsoft.com/dotnet/architecture>
* **OSS Examples** Kubernetes Architecture docs, Prometheus design papers, Backstage.
---
##14·Change Log
*(Git history is authoritative; table for reader convenience.)*
| Date | Note |
|------------|--------------------------------------------------------------------------------------------------------|
| 20250713 | Added internal registry, multiformat SBOM, delta flow, Policy as Code, Attestor integration sections. |
| 20250712 | Reorganised doc around C4; added diagrams, tradeoffs, security notes. |
| 20250711 | Initial opensourcing pass  removed commercial references, added plugin hooks and UI details. |
---
##14Change Log
| Version | Date | Notes |
| ------- | ---------- | ------------------------------------------------------------------------------------------------ |
| v2.4 | 20250712 | New modules |
| v2.3 | 20250712 | Adopted C4 structure, added diagrams/trade-offs/security notes/contribution hooks/references. |
| v2.2 | 20250711 | Removed last commercial refs; added TLS/IdP/report plugin hooks; deeper UI & scenario sections; |
| v2.1 | 20250711 | Added scenarios, UI details, plugin depth. |
| v2.0 | 20250711 | Full rewrite. |
*(End of HighLevel Architecture v2.2)*