Files
git.stella-ops.org/docs/07_HIGH_LEVEL_ARCHITECTURE.md

20 KiB
Raw Permalink Blame History

#7 · HighLevel Architecture — StellaOps


##0Purpose &Scope

Give contributors, DevOps engineers and auditors a complete yet readable map of the Core:

  • Major runtime components and message paths.
  • Where plugins, CLI helpers and runtime agents attach.
  • Technology choices that enable the sub5second SBOM goal.
  • Typical operational scenarios (pipeline scan, mute, nightly rescan, etc.).

Anything enterpriseonly (signed PDF, Cryptospecific TLS, LDAP, enforcement) must arrive as a plugin; the Core never hardcodes those concerns.


##1Component Overview

# Component Responsibility
1 API Gateway REST endpoints (/scan, /quota, /token/offline); token auth; quota enforcement
2 Scan Service SBOM parsing, DeltaSBOM cache, vulnerability lookup
3 Policy Engine YAML / (optional) Rego rule evaluation; verdict assembly
4 Quota Service Pertoken counters; 333 scans/day; waits & HTTP 429
5 ClientJWT Issuer Issues 30day offline tokens; bundles them into OUK
5 Registry Anonymous internal Docker registry for agents, SBOM uploads
6 Web UI React/Blazor SPA; dashboards, policy editor, quota banner
7 Data Stores Redis (cache, quota) & MongoDB (SBOMs, findings, audit)
8 Plugin Host Hotload .NET DLLs; isolates community plugins
9 Agents sbombuilder, SanTech scanner CLI, future StellaOpsAttestor
flowchart TD
    subgraph "External Actors"
        DEV["Developer / DevSecOps / Manager"]
        CI["CI/CD Pipeline (e.g., SanTech CLI)"]
        K8S["Kubernetes Cluster (e.g., Zastava Agent)"]
    end

    subgraph "Stella Ops Runtime"
        subgraph "Core Services"
            CORE["Stella Core<br>(REST + gRPC APIs, Orchestration)"]
            REDIS[("Redis<br>(Cache, Queues, Trivy DB Mirror)")]
            MONGO[("MongoDB<br>(Optional: Long-term Storage)")]
            POL["Mute Policies<br>(OPA & YAML Evaluator)"]
            REG["StellaOps Registry<br>(Docker Registry v2)"]
            ATT["StellaOps Attestor<br>(SLSA + Rekor)"]
        end

        subgraph "Agents & Builders"
            SB["SBOM Builder<br>(Go Binary: Extracts Layers, Generates SBOMs)"]
            SA["SanTech Agent<br>(Pipeline Helper: Invokes Builder, Triggers Scans)"]
            ZA["Zastava Agent<br>(K8s Webhook: Enforces Policies, Inventories Containers)"]
        end

        subgraph "Scanners & UI"
            TRIVY["Trivy Scanner<br>(Plugin Container: Vulnerability Scanning)"]
            UI["Web UI<br>(Vue3 + Tailwind: Dashboards, Policy Editor)"]
            CLI["Stella CLI<br>(CLI Helper: Triggers Scans, Mutes)"]
        end
    end

    DEV -->|Browses Findings, Mutes CVEs| UI
    DEV -->|Triggers Scans| CLI
    CI -->|Generates SBOM, Calls /scan| SA
    K8S -->|Inventories Containers, Enforces Gates| ZA

    UI -- "REST" --> CORE
    CLI -- "REST/gRPC" --> CORE
    SA -->|Scan Requests| CORE
    SB -->|Uploads SBOMs| CORE
    ZA -->|Policy Gates| CORE

    CORE -- "Queues, Caches" --> REDIS
    CORE -- "Persists Data" --> MONGO
    CORE -->|Evaluates Policies| POL
    CORE -->|Attests Provenance| ATT
    CORE -->|Scans Vulnerabilities| TRIVY

    SB -- "Pulls Images" --> REG
    SA -- "Pulls Images" --> REG
    ZA -- "Pulls Images" --> REG

    style DEV fill:#f9f,stroke:#333
    style CI fill:#f9f,stroke:#333
    style K8S fill:#f9f,stroke:#333
    style CORE fill:#ddf,stroke:#333
    style REDIS fill:#fdd,stroke:#333
    style MONGO fill:#fdd,stroke:#333
    style POL fill:#dfd,stroke:#333
    style REG fill:#dfd,stroke:#333
    style ATT fill:#dfd,stroke:#333
    style SB fill:#fdf,stroke:#333
    style SA fill:#fdf,stroke:#333
    style ZA fill:#fdf,stroke:#333
    style TRIVY fill:#ffd,stroke:#333
    style UI fill:#ffd,stroke:#333
    style CLI fill:#ffd,stroke:#333
  • Developer / DevSecOps / Manager browses findings, mutes CVEs, triggers scans.
  • Santech CLI generates SBOMs and calls /scan during CI.
  • Zastava Agent inventories live containers; Core ships it in passive mode only (no kill).

###1.1ClientJWT Lifecycle (offline aware)

  1. Online instance user signs in → /connect/token issues JWT valid 12h.
  2. Offline instance JWT with exp ≈30days ships in OUK; backend resigns and stores it during import.
  3. Tokens embed a tier claim (“Free”) and maxScansPerDay: 333.
  4. On expiry the UI surfaces a red toast 7days in advance.

##2·Component Responsibilities (runtime view)

Component Core Responsibility Implementation Highlights
Stella Core Orchestrates scans, persists SBOM blobs, serves REST/gRPC APIs, fans out jobs to scanners & policy engine. .NET8, CQRS, Redis Streams; pluggable runner interfaces.
SBOM Builder Extracts image layers, queries Core for missing layers, generates SBOMs (multiformat), uploads blobs. Go binary; wraps Trivy & Syft libs.
SanTech Agent Pipelineside helper; invokes Builder, triggers scan, streams progress back to CI/CD. Static musl build.
Zastava Agent K8s admission webhook enforcing policy verdicts before Pod creation. Rust for sub10ms latencies.
UI Angular17 SPA for dashboards, settings, policy editor. Tailwind CSS; Webpack module federation (future).
Redis Cache, queue, TrivyDB mirror, layer diffing. Single instance or Sentinel.
MongoDB (opt.) Longterm SBOM & policy audit storage (>180days). Optional; enabled via flag.
StellaOps.Registry Anonymous readonly Docker v2 registry with optional Cosign verification. registry :2 behind nginx reverse proxy.
StellaOps.MutePolicies YAML/Rego evaluator, policy version store, /policy/* API. Embeds OPAWASM; falls back to opa exec.
StellaOpsAttestor Generate SLSA provenance & Rekor signatures; verify on demand. Sidecar container; DSSE + Rekor CLI.

All crosscomponent calls use dependencyinjected interfaces—no intracomponent reachins.


##3·Principal Backend Modules & Plugin Hooks

Namespace Responsibility Builtin Tech / Default Plugin Contract
configuration Parse env/JSON, healthcheck endpoint .NET9 Options IConfigValidator
identity Embedded OAuth2/OIDC (OpenIddict 6) MIT OpenIddict IIdentityProvider for LDAP/SAML/JWT gateway
pluginloader Discover DLLs, SemVer gate, optional Cosign verify Reflection + Cosign IPluginLifecycleHook for telemetry
scanning SBOM & imageflow orchestration; runner pool Trivy CLI (default) IScannerRunner e.g., Grype, Copacetic, Clair
feedmerger Nightly NVD merge & feed enrichment Hangfire job dropin *.Schedule.dll for OSV, GHSA, BDU feeds
tls TLS provider abstraction OpenSSL ITlsProvider for GOST, SMseries, custom suites
reporting Render HTML/PDF reports RazorLight IReportRenderer
ui Angular SPA & i18n Angular 17 new locales via /locales/{lang}.json
scheduling Cron + retries Hangfire any recurrent job via *.Schedule.dll
classDiagram
    class configuration
    class identity
    class pluginloader
    class scanning
    class feedmerger
    class tls
    class reporting
    class ui
    class scheduling

    class AllModules

    configuration ..> identity : Uses
    identity ..> pluginloader : Authenticates Plugins
    pluginloader ..> scanning : Loads Scanner Runners
    scanning ..> feedmerger : Triggers Feed Merges
    tls ..> AllModules : Provides TLS Abstraction
    reporting ..> ui : Renders Reports for UI
    scheduling ..> feedmerger : Schedules Nightly Jobs

    note for scanning "Pluggable: ISScannerRunner<br>e.g., Trivy, Grype"
    note for feedmerger "Pluggable: *.Schedule.dll<br>e.g., OSV, GHSA Feeds"
    note for identity "Pluggable: IIdentityProvider<br>e.g., LDAP, SAML"
    note for reporting "Pluggable: IReportRenderer<br>e.g., Custom PDF"

When remaining =0:
API returns 429 Too Many Requests, RetryAfter: <UTCmidnight> (sequence omitted for brevity).


##4·Data Flows

###4.1SBOMFirst (≤5s P95)

Builder produces SBOM locally, so Core never touches the Docker socket. Trivy path hits ≤5s on alpine:3.19 with warmed DB. Imageunpack fallback stays ≤10s for 200MB images.

sequenceDiagram
    participant CI as CI/CD Pipeline (SanTech Agent)
    participant SB as SBOM Builder
    participant CORE as Stella Core
    participant REDIS as Redis Queue
    participant RUN as Scanner Runner (e.g., Trivy)
    participant POL as Policy Evaluator

    CI->>SB: Invoke SBOM Generation
    SB->>CORE: Check Missing Layers (/layers/missing)
    CORE->>REDIS: Query Layer Diff (SDIFF)
    REDIS-->>CORE: Missing Layers List
    CORE-->>SB: Return Missing Layers
    SB->>SB: Generate Delta SBOM
    SB->>CORE: Upload SBOM Blob (POST /scan(sbom))
    CORE->>REDIS: Enqueue Scan Job
    REDIS->>RUN: Fan Out to Runner
    RUN->>RUN: Perform Vulnerability Scan
    RUN-->>CORE: Return Scan Results
    CORE->>POL: Evaluate Mute Policies
    POL-->>CORE: Policy Verdict
    CORE-->>CI: JSON Verdict & Progress Stream
    Note over CORE,CI: Achieves ≤5s P95 with Warmed DB

###4.2Delta SBOM

Builder collects layer digests. POST /layers/missing → Redis SDIFF → missing layer list (<20ms). SBOM generated only for those layers and uploaded.

###4.3Feed Enrichment

sequenceDiagram
    participant CRON as Nightly Cron (Hangfire)
    participant FM as Feed Merger
    participant NVD as NVD Feed
    participant OSV as OSV Plugin (Optional)
    participant GHSA as GHSA Plugin (Optional)
    participant BDU as BDU Plugin (Optional)
    participant REDIS as Redis (Merged Feed Storage)
    participant UI as Web UI

    CRON->>FM: Trigger at 00:59
    FM->>NVD: Fetch & Merge NVD Data
    alt Optional Plugins
        FM->>OSV: Merge OSV Feed
        FM->>GHSA: Merge GHSA Feed
        FM->>BDU: Merge BDU Feed
    end
    FM->>REDIS: Persist Merged Feed
    REDIS-->>UI: Update Feed Freshness
    UI->>UI: Display Green 'Feed Age' Tile

###4.4Identity & Auth Flow

OpenIddict issues JWTs via clientcredentials or password grant. An IIdentityProvider plugin can delegate to LDAP, SAML or external OIDC without Core changes.

##5·Runtime Helpers

Helper Form Purpose Extensible Bits
SanTech Distroless CLI Generates SBOM, calls /scan, honours threshold flag --engine, --pdf-out piped to plugins
Zastava Static Go binary / DaemonSet Watches Docker/CRIO events; uploads SBOMs; can enforce gate Policy plugin could alter thresholds

##6·Persistence & Cache Strategy

Store Primary Use Why chosen
Redis7 Queue, SBOM cache, Trivy DB mirror Sub1ms P99 latency
MongoDB History>180d, audit logs, policy versions Optional; documentoriented
Local tmpfs Trivy layer cache (/var/cache/trivy) Keeps disk I/O off hot path
flowchart LR
    subgraph "Persistence Layers"
        REDIS[(Redis: Fast Cache/Queues<br>Sub-1ms P99)]
        MONGO[(MongoDB: Optional Audit/History<br>>180 Days)]
        TMPFS[(Local tmpfs: Trivy Layer Cache<br>Low I/O Overhead)]
    end

    CORE["Stella Core"] -- Queues & SBOM Cache --> REDIS
    CORE -- Long-term Storage --> MONGO
    TRIVY["Trivy Scanner"] -- Layer Unpack Cache --> TMPFS

    style REDIS fill:#fdd,stroke:#333
    style MONGO fill:#dfd,stroke:#333
    style TMPFS fill:#ffd,stroke:#333

##7·Typical Scenarios

# Flow Steps
S1 Pipeline Scan & Alert SanTech → SBOM → /scan → policy verdict → CI exit code & link to Scan Detail
S2 Mute Noisy CVE Dev toggles Mute in UI → rule stored in Redis → next build passes
S3 Nightly Rescan SbomNightly.Schedule requeues SBOMs (maskfilter) → dashboard highlights new Criticals
S4 Feed Update Cycle FeedMerger merges feeds → UI Feed Age tile turns green
S5 Custom Report Generation Plugin registers IReportRenderer/report/custom/{digest} → CI downloads artifact
sequenceDiagram
    participant DEV as Developer
    participant UI as Web UI
    participant CORE as Stella Core
    participant REDIS as Redis
    participant RUN as Scanner Runner

    DEV->>UI: Toggle Mute for CVE
    UI->>CORE: Update Mute Rule (POST /policy/mute)
    CORE->>REDIS: Store Mute Policy
    Note over CORE,REDIS: YAML/Rego Evaluator Updates

    alt Next Pipeline Build
        CI->>CORE: Trigger Scan (POST /scan)
        CORE->>RUN: Enqueue & Scan
        RUN-->>CORE: Raw Findings
        CORE->>REDIS: Apply Mute Policies
        REDIS-->>CORE: Filtered Verdict (Passes)
        CORE-->>CI: Success Exit Code
    end
sequenceDiagram
    participant CRON as SbomNightly.Schedule
    participant CORE as Stella Core
    participant REDIS as Redis Queue
    participant RUN as Scanner Runner
    participant UI as Dashboard

    CRON->>CORE: Re-queue SBOMs (Mask-Filter)
    CORE->>REDIS: Enqueue Filtered Jobs
    REDIS->>RUN: Fan Out to Runners
    RUN-->>CORE: New Scan Results
    CORE->>UI: Highlight New Criticals
    Note over CORE,UI: Focus on Changes Since Last Scan

##8·UIFastFacts

  • Stack Angular17 + Vite dev server; Tailwind CSS.
  • State Signals + RxJS for live scan progress.
  • i18n / l10n JSON bundles served from /locales/{lang}.json.
  • ModuleStructure Lazyloaded feature modules (dashboard, scans, settings); runtime route injection by UI plugins (roadmap Q22026).

##9·CrossCutting Concerns

  • Security containers run nonroot, CAP_DROP:ALL, readonly FS, hardened seccomp profiles.
  • Observability Serilog JSON, OpenTelemetry OTLP exporter, Prometheus /metrics.
  • Upgrade Policy /api/v1 endpoints & CLI flags stable across a minor; breaking changes bump major.

##10·Performance & Scalability

Scenario P95 target Bottleneck Mitigation
SBOMfirst 5s Redis queue More CPU, increase ScannerPool.Workers
Imageunpack 10s Layer unpack Prefer SBOM path, warm Docker cache
High concurrency 40rps Runner CPU Scale Core replicas + sidecar scanner services

##11·Future Architectural Anchors

  • ScanService microsplit (gRPC) isolate heavy runners for large clusters.
  • UI route plugins dynamic Angular module loader (roadmap Q22026).
  • Redis Cluster transparently sharded cache once sustained>100rps.

##12·Assumptions & Tradeoffs

Requires Docker/CRIO runtime; .NET9 available on hosts; Windows containers are outofscope this cycle.
Embedded auth simplifies deployment but may need plugins for enterprise IdPs.
Speed is prioritised over exhaustive feature parity with heavyweight commercial scanners.


##13·References & Further Reading


##14·Change Log
(Git history is authoritative; table for reader convenience.)

Date Note
20250713 Added internal registry, multiformat SBOM, delta flow, Policy as Code, Attestor integration sections.
20250712 Reorganised doc around C4; added diagrams, tradeoffs, security notes.
20250711 Initial opensourcing pass  removed commercial references, added plugin hooks and UI details.

##14Change Log

Version Date Notes
v2.4 20250712 New modules
v2.3 20250712 Adopted C4 structure, added diagrams/trade-offs/security notes/contribution hooks/references.
v2.2 20250711 Removed last commercial refs; added TLS/IdP/report plugin hooks; deeper UI & scenario sections;
v2.1 20250711 Added scenarios, UI details, plugin depth.
v2.0 20250711 Full rewrite.

(End of HighLevel Architecture v2.2)