Created detailed architectural documentation showing component interactions, communication patterns, and data flows across all StellaOps services. ## New Documentation **docs/ARCHITECTURE_DETAILED.md** - Comprehensive architecture guide: - Component topology diagram (all 36+ services) - Infrastructure layer details (PostgreSQL, Valkey, RustFS, NATS) - Service-by-service catalog with responsibilities - Communication patterns with WHY (business purpose) - 5 detailed data flow diagrams: 1. Scan Request Flow (CLI → Scanner → Worker → Policy → Signer → Attestor → Notify) 2. Advisory Update Flow (Concelier → Scheduler → Scanner re-evaluation) 3. VEX Update Flow (Excititor → IssuerDirectory → Scheduler → Policy) 4. Notification Delivery Flow (Scanner → Valkey → Notify → Slack/Teams/Email) 5. Policy Evaluation Flow (Scanner → Policy.Gateway → OPA → PostgreSQL replication) - Database schema isolation details per service - Security boundaries and authentication flows ## Updated Documentation **docs/DEVELOPER_ONBOARDING.md**: - Added link to detailed architecture - Simplified overview with component categories - Quick reference topology tree **docs/07_HIGH_LEVEL_ARCHITECTURE.md**: - Updated infrastructure requirements section - Clarified PostgreSQL as ONLY database - Emphasized Valkey as REQUIRED (not optional) - Marked NATS as optional (Valkey is default transport) **docs/README.md**: - Added link to detailed architecture in navigation ## Key Architectural Insights Documented **Communication Patterns:** - 11 communication steps in scan flow (Gateway → Scanner → Valkey → Worker → Concelier → Policy → Signer → Attestor → Valkey → Notify → Slack) - PostgreSQL logical replication (advisory_raw_stream, vex_raw_stream → Policy Engine) - Valkey Streams for async job queuing (XADD/XREADGROUP pattern) - HTTP webhooks for delta events (Concelier/Excititor → Scheduler) **Security Boundaries:** - Authority issues OpToks with DPoP binding (RFC 9449) - Signer enforces PoE validation + scanner digest verification - All services validate JWT + DPoP on every request - Tenant isolation via tenant_id in all PostgreSQL queries **Database Patterns:** - 8 dedicated PostgreSQL schemas (authority, scanner, vuln, vex, scheduler, notify, policy, orchestrator) - Append-only advisory/VEX storage (AOC - Aggregation-Only Contract) - BOM-Index for impact selection (CVE → PURL → image mapping) This documentation provides complete visibility into who calls who, why they communicate, what data flows through the system, and how security is enforced at every layer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1519 lines
68 KiB
Markdown
1519 lines
68 KiB
Markdown
# StellaOps Platform - Detailed Architecture
|
|
|
|
**Last Updated:** 2025-12-23
|
|
**Purpose:** Comprehensive component architecture with communication patterns and data flows
|
|
|
|
## Table of Contents
|
|
|
|
1. [Component Topology](#component-topology)
|
|
2. [Infrastructure Layer](#infrastructure-layer)
|
|
3. [Service Catalog](#service-catalog)
|
|
4. [Communication Patterns](#communication-patterns)
|
|
5. [Data Flow Diagrams](#data-flow-diagrams)
|
|
6. [Database Schema Isolation](#database-schema-isolation)
|
|
7. [Security Boundaries](#security-boundaries)
|
|
|
|
---
|
|
|
|
## Component Topology
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ CLIENT LAYER │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
|
│ │ stella │ │ Web UI │ │ CI/CD │ │ Zastava │ │
|
|
│ │ CLI │ │ Angular │ │ Pipeline │ │ Observer │ │
|
|
│ └─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘ │
|
|
│ │ │ │ │ │
|
|
└────────┼─────────────┼─────────────┼─────────────┼──────────────────────────┘
|
|
│ │ │ │
|
|
└─────────────┴─────────────┴─────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ GATEWAY LAYER │
|
|
│ ┌───────────────────────────────────────────────────────────────┐ │
|
|
│ │ Gateway.WebService │ │
|
|
│ │ • JWT validation • Rate limiting │ │
|
|
│ │ • DPoP verification • Request routing │ │
|
|
│ │ • Tenant resolution • Correlation tracking │ │
|
|
│ └───┬────────────────────────────────────────────────┬───────────┘ │
|
|
│ │ │ │
|
|
└──────┼────────────────────────────────────────────────┼─────────────────────┘
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ AUTHORITY │◄───────────────────────────│ ALL SERVICES │
|
|
│ │ OpTok validation │ (Resource │
|
|
│ • OAuth2/OIDC │ DPoP nonce verification │ servers) │
|
|
│ • DPoP binding │ │ │
|
|
│ • OpTok issue │ └─────────────────┘
|
|
│ • mTLS verify │
|
|
└────────┬────────┘
|
|
│ stores tokens,
|
|
│ audit trails
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ CORE SERVICES LAYER │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ SCANNING ENGINE │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Scanner.WebService │────────▶│ Scanner.Worker │ │ │
|
|
│ │ │ │ Valkey │ │ │ │
|
|
│ │ │ • Scan orchestrate │ queue │ • Layer analysis │ │ │
|
|
│ │ │ • Report catalog │ │ • SBOM generation │ │ │
|
|
│ │ │ • Policy eval │ │ • Reachability │ │ │
|
|
│ │ └─────┬──────────────┘ └────────┬───────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ linkset │ artifact │ │
|
|
│ │ │ query │ upload │ │
|
|
│ │ ▼ ▼ │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Concelier │ │ RustFS │ │ │
|
|
│ │ │ WebService │ │ (S3 API) │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ ADVISORY INGESTION ENGINE │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Concelier.WebService│──────▶│ Concelier.Worker │ │ │
|
|
│ │ │ │ Jobs │ │ │ │
|
|
│ │ │ • Ingest advisories│ │ • Connector fetch │ │ │
|
|
│ │ │ • Compute linksets │ │ • Normalize data │ │ │
|
|
│ │ │ • AOC enforcement │ │ • Delta detection │ │ │
|
|
│ │ └─────┬──────────────┘ └────────────────────┘ │ │
|
|
│ │ │ │ │
|
|
│ │ │ webhook: advisory delta events │ │
|
|
│ │ ▼ │ │
|
|
│ │ ┌──────────────┐ │ │
|
|
│ │ │ Scheduler │ │ │
|
|
│ │ │ WebService │ │ │
|
|
│ │ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ VEX INGESTION ENGINE │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Excititor.WebService│──────▶│ Excititor.Worker │ │ │
|
|
│ │ │ │ Jobs │ │ │ │
|
|
│ │ │ • Ingest VEX │ │ • Fetch VEX feeds │ │ │
|
|
│ │ │ • DSSE verify │ │ • Trust verify │ │ │
|
|
│ │ │ • Consensus calc │ │ • Signature check │ │ │
|
|
│ │ └─────┬──────────────┘ └──────┬─────────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ webhook: VEX delta │ trust lookup │ │
|
|
│ │ ▼ ▼ │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Scheduler │ │ Issuer │ │ │
|
|
│ │ │ WebService │ │ Directory │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ ORCHESTRATION & SCHEDULING │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Scheduler.WebService│──────▶│ Scheduler.Worker │ │ │
|
|
│ │ │ │ Jobs │ │ │ │
|
|
│ │ │ • Impact select │ │ • Re-scan trigger │ │ │
|
|
│ │ │ • Rate limit │ │ • Batch enforce │ │ │
|
|
│ │ │ • Maintenance win │ │ • Progress track │ │ │
|
|
│ │ └─────┬──────────────┘ └──────┬─────────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ │ HTTP: enqueue scan │ │
|
|
│ │ │ ▼ │ │
|
|
│ │ │ ┌──────────────┐ │ │
|
|
│ │ │ │ Scanner.Web │ │ │
|
|
│ │ │ └──────────────┘ │ │
|
|
│ │ │ │ │
|
|
│ │ ┌─────▼──────────────┐ │ │
|
|
│ │ │ Orchestrator.Web │ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ • DAG workflows │ │ │
|
|
│ │ │ • Pack runs │ │ │
|
|
│ │ │ • Job streaming │ │ │
|
|
│ │ └────────────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ NOTIFICATION ENGINE │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Notify.WebService │────────▶│ Notify.Worker │ │ │
|
|
│ │ │ │ Valkey │ │ │ │
|
|
│ │ │ • Channel mgmt │ Streams │ • Slack delivery │ │ │
|
|
│ │ │ • Template engine │ XADD/ │ • Teams delivery │ │ │
|
|
│ │ │ • Throttle/digest │ XREAD │ • Email delivery │ │ │
|
|
│ │ └─────▲──────────────┘ └──────┬─────────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ report.ready events │ External HTTP/SMTP │ │
|
|
│ │ │ ▼ │ │
|
|
│ │ ┌─────┴──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Scanner.Web │ │ Slack API │ │ │
|
|
│ │ │ (events) │ │ Teams API │ │ │
|
|
│ │ └────────────────────┘ │ SMTP │ │ │
|
|
│ │ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ CRYPTOGRAPHIC SERVICES │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Signer.WebService │────────▶│ Attestor.WebService│ │ │
|
|
│ │ │ │ mTLS │ │ │ │
|
|
│ │ │ • DSSE signing │ OpTok │ • Rekor v2 submit │ │ │
|
|
│ │ │ • PoE validation │ │ • Receipt verify │ │ │
|
|
│ │ │ • Multi-profile │ │ • Offline bundles │ │ │
|
|
│ │ │ FIPS/GOST/SM │ │ │ │ │
|
|
│ │ └─────┬──────────────┘ └──────┬─────────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ KMS/PKCS11 │ External │ │
|
|
│ │ ▼ ▼ │ │
|
|
│ │ ┌──────────────┐ ┌──────────────┐ │ │
|
|
│ │ │ External KMS │ │ Rekor v2 │ │ │
|
|
│ │ │ (AWS/GCP) │ │ (Sigstore) │ │ │
|
|
│ │ └──────────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
|
│ │ POLICY ENGINE │ │
|
|
│ │ │ │
|
|
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
|
|
│ │ │ Policy.Gateway │────────▶│ Policy Engine │ │ │
|
|
│ │ │ │ HTTP │ (OPA/Rego) │ │ │
|
|
│ │ │ • Exception mgmt │ │ │ │ │
|
|
│ │ │ • Approval flow │ │ • Rule eval │ │ │
|
|
│ │ │ • Delta compute │ │ • Verdict compute │ │ │
|
|
│ │ └─────▲──────────────┘ └──────▲─────────────┘ │ │
|
|
│ │ │ │ │ │
|
|
│ │ │ policy eval request │ PostgreSQL │ │
|
|
│ │ │ │ logical replication │ │
|
|
│ │ ┌─────┴──────────────┐ │ │ │
|
|
│ │ │ Scanner.Web │ ┌─────┴──────────┐ │ │
|
|
│ │ │ (verdict request) │ │ advisory_raw │ │ │
|
|
│ │ └────────────────────┘ │ vex_raw │ │ │
|
|
│ │ │ (streams) │ │ │
|
|
│ │ └────────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ INFRASTRUCTURE LAYER │
|
|
│ │
|
|
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
|
│ │ PostgreSQL │ │ Valkey │ │ RustFS │ │
|
|
│ │ v16+ │ │ v8.0 │ │ (S3-compatible) │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ • Per-service │ │ • DPoP nonces │ │ • SBOM artifacts │ │
|
|
│ │ schemas │ │ • Event streams │ │ • Proof bundles │ │
|
|
│ │ • Logical │ │ • Job queues │ │ • CAS storage │ │
|
|
│ │ replication │ │ • Cache │ │ │ │
|
|
│ │ • REQUIRED │ │ • REQUIRED │ │ • REQUIRED │ │
|
|
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────┐ │
|
|
│ │ NATS │ │
|
|
│ │ JetStream │ │
|
|
│ │ │ │
|
|
│ │ • Message queue │ │
|
|
│ │ • Optional │ │
|
|
│ │ (Valkey is │ │
|
|
│ │ default) │ │
|
|
│ └──────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Infrastructure Layer
|
|
|
|
### PostgreSQL v16+ (REQUIRED)
|
|
|
|
**Purpose:** Primary database for ALL persistent data
|
|
|
|
**Schema Isolation:**
|
|
|
|
| Schema | Owner Service | Purpose |
|
|
|--------|---------------|---------|
|
|
| `authority` | Authority | Users, clients, tenants, keys, audit trails |
|
|
| `scanner` | Scanner | Scan manifests, triage, EPSS, reachability graphs |
|
|
| `vuln` | Concelier | Advisory raw documents, linksets, observations |
|
|
| `vex` | Excititor | VEX raw documents, consensus, provider state |
|
|
| `scheduler` | Scheduler | Graph jobs, runs, schedules, impact snapshots |
|
|
| `notify` | Notify | Channels, templates, delivery history, digests |
|
|
| `policy` | Policy.Gateway | Exception objects, snapshots, unknowns |
|
|
| `orchestrator` | Orchestrator | Sources, runs, jobs, DAGs, pack runs |
|
|
|
|
**Special Features:**
|
|
- **Logical Replication:** `advisory_raw_stream`, `vex_raw_stream` → Policy Engine
|
|
- **Per-tenant isolation:** Tenant ID in all tables for row-level security
|
|
- **Append-only patterns:** AOC (Aggregation-Only Contract) for advisory/VEX immutability
|
|
|
|
### Valkey v8.0 (REQUIRED)
|
|
|
|
**Purpose:** Cache, DPoP security, event streams, job queues
|
|
|
|
**Use Cases:**
|
|
|
|
| Pattern | Services | Purpose |
|
|
|---------|----------|---------|
|
|
| DPoP nonces | Authority | RFC 9449 nonce storage (30s TTL) |
|
|
| Event streams | Scanner, Notify, Scheduler | XADD for `report.ready`, drift events |
|
|
| Job queues | Scanner, Notify | XREADGROUP for worker coordination |
|
|
| Cache | All services | Distributed caching with tenant prefixes |
|
|
| Rate limiting | Gateway, Authority | Token bucket counters |
|
|
|
|
**Default Transport:** Valkey Streams preferred over NATS for queuing
|
|
|
|
### RustFS (REQUIRED)
|
|
|
|
**Purpose:** S3-compatible object storage for artifacts
|
|
|
|
**Buckets:**
|
|
|
|
| Bucket | Services | Content |
|
|
|--------|----------|---------|
|
|
| `scanner-artifacts` | Scanner | Layer SBOMs, composed SBOMs, proof bundles |
|
|
| `surface-cache` | Scanner.Worker | Extracted filesystem surfaces |
|
|
| `evidence-locker` | Evidence Locker | Immutable audit evidence |
|
|
| `cas-replay` | Replay Engine | Content-addressed snapshots |
|
|
|
|
**API:** HTTP/S3 with optional API key authentication
|
|
|
|
### NATS JetStream (OPTIONAL)
|
|
|
|
**Purpose:** Alternative messaging transport (not default)
|
|
|
|
**When to Use:**
|
|
- High-throughput environments requiring persistent streams
|
|
- Multi-datacenter replication scenarios
|
|
- When Valkey Streams insufficient for scale
|
|
|
|
**Default:** Valkey is preferred; NATS opt-in via configuration
|
|
|
|
---
|
|
|
|
## Service Catalog
|
|
|
|
### Gateway Layer
|
|
|
|
#### Gateway.WebService
|
|
|
|
**Port:** 8080 (HTTP), 8443 (HTTPS)
|
|
**Dependencies:** Authority (JWT validation), Backend services (routing)
|
|
|
|
**Responsibilities:**
|
|
- **Authentication:** JWT + DPoP verification on all requests
|
|
- **Authorization:** Scope-based access control (RBAC claims)
|
|
- **Tenant Resolution:** Multi-tenant routing via `X-Tenant-Id` header or JWT
|
|
- **Rate Limiting:** Per-client token bucket (Valkey-backed)
|
|
- **Request Routing:** Routes to Scanner, Concelier, Policy, Scheduler, Notify
|
|
- **Correlation Tracking:** Injects `X-Correlation-Id` for distributed tracing
|
|
|
|
**Security Boundaries:**
|
|
- TLS termination (mutual TLS optional)
|
|
- DPoP sender constraint validation
|
|
- OpTok refresh on expiry
|
|
|
|
---
|
|
|
|
### Authentication & Security
|
|
|
|
#### Authority
|
|
|
|
**Port:** 8440 (HTTPS)
|
|
**Database:** `authority` schema
|
|
**Dependencies:** Valkey (DPoP nonces), External LDAP/OIDC (plugins)
|
|
|
|
**Responsibilities:**
|
|
- **OAuth 2.1 Server:** Issues OpToks (operational tokens) with DPoP binding
|
|
- **Client Credentials Flow:** Machine-to-machine authentication
|
|
- **Resource Owner Password Flow:** User authentication with LDAP/OIDC
|
|
- **DPoP (RFC 9449):** Sender-constrained tokens with nonce validation
|
|
- **mTLS:** Certificate-based client authentication
|
|
- **Audit Trails:** All authentication events logged to PostgreSQL
|
|
- **Multi-Tenancy:** Tenant-scoped token issuance
|
|
|
|
**Token Types:**
|
|
- **OpTok:** Short-lived (15 min), DPoP-bound, scoped access token
|
|
- **Refresh Token:** Rotation-protected, 7-day expiry
|
|
- **ID Token:** OIDC identity claims
|
|
|
|
**Security:**
|
|
- DPoP nonces stored in Valkey with 30s TTL
|
|
- OpTok signatures verified by all resource servers
|
|
- Rate limiting on failed login attempts
|
|
|
|
#### Signer.WebService
|
|
|
|
**Port:** 8441 (HTTPS with mTLS)
|
|
**Dependencies:** Authority (PoE validation), External KMS (optional), OCI Registry (scanner digest verification)
|
|
|
|
**Responsibilities:**
|
|
- **DSSE Signing:** Signs in-toto envelopes for SBOMs, VEX, attestations
|
|
- **PoE Validation:** Validates Proof-of-Entitlement (license check)
|
|
- **Multi-Profile Keys:** FIPS, GOST (CryptoPro), SM (Chinese national crypto)
|
|
- **Scanner Authenticity:** Verifies scanner image digest is Stella Ops-signed
|
|
- **Key Management:** HSM/KMS integration (AWS, GCP, PKCS11)
|
|
|
|
**Hard Gates (Reject on Failure):**
|
|
1. OpTok validation (DPoP + mTLS)
|
|
2. PoE license check
|
|
3. Scanner image digest verification (cosign signature)
|
|
|
|
**Key Profiles:**
|
|
|
|
| Profile | Algorithm | Use Case |
|
|
|---------|-----------|----------|
|
|
| `default` | ECDSA P-256 | Standard signing |
|
|
| `fips` | ECDSA P-384 | FIPS 140-2 compliance |
|
|
| `gost` | GOST R 34.10-2012 | Russian regulations |
|
|
| `sm` | SM2 | Chinese regulations |
|
|
|
|
#### Attestor.WebService
|
|
|
|
**Port:** 8442 (HTTPS)
|
|
**Dependencies:** Signer (DSSE signing), Rekor v2 (transparency log)
|
|
|
|
**Responsibilities:**
|
|
- **Rekor Submission:** Posts DSSE bundles to Sigstore Rekor v2
|
|
- **Receipt Retrieval:** Fetches inclusion proofs from Rekor
|
|
- **Offline Bundles:** Generates offline verification bundles
|
|
- **Verification:** Validates Rekor receipts for CLI/CI
|
|
|
|
**Workflow:**
|
|
1. Receive DSSE envelope from Scanner/Excititor
|
|
2. Call Signer for signature (mTLS)
|
|
3. Submit signed DSSE to Rekor v2
|
|
4. Retrieve inclusion proof
|
|
5. Return receipt to caller
|
|
|
|
**Offline Mode:**
|
|
- Optional: Can operate without Rekor if `OFFLINEKIT_ENABLED=true`
|
|
- Uses local timestamp service for non-repudiation
|
|
|
|
---
|
|
|
|
### Scanning Engine
|
|
|
|
#### Scanner.WebService
|
|
|
|
**Port:** 8444 (HTTP)
|
|
**Database:** `scanner` schema
|
|
**Object Storage:** RustFS `scanner-artifacts` bucket
|
|
**Dependencies:** Authority (auth), Concelier (linkset queries), Policy (evaluation), Signer (DSSE), Attestor (Rekor)
|
|
|
|
**Responsibilities:**
|
|
- **Scan Orchestration:** Enqueues scan jobs to Scanner.Worker via Valkey
|
|
- **Report Catalog:** Maintains scan history, triage data, policy verdicts
|
|
- **Linkset Enrichment:** Queries Concelier for advisory linksets by PURL/CPE
|
|
- **Policy Evaluation:** Calls Policy.Gateway for verdict computation
|
|
- **SBOM Export:** Generates SPDX 3.0.1 and CycloneDX 1.6 SBOMs
|
|
- **VEX Export:** Calls Excititor for VEX statement generation
|
|
- **Proof Bundles:** Assembles DSSE envelopes with signatures + Rekor receipts
|
|
- **Event Publishing:** Emits `report.ready` events to Notify via Valkey Streams
|
|
|
|
**API Endpoints:**
|
|
|
|
| Endpoint | Method | Purpose |
|
|
|----------|--------|---------|
|
|
| `/v1/scans` | POST | Enqueue scan job |
|
|
| `/v1/scans/{id}` | GET | Retrieve scan report |
|
|
| `/v1/scans/{id}/sbom` | GET | Download SBOM (SPDX/CycloneDX) |
|
|
| `/v1/scans/{id}/vex` | GET | Download VEX document |
|
|
| `/v1/scans/{id}/proof` | GET | Download proof bundle (DSSE + Rekor receipt) |
|
|
| `/v1/triage` | POST | Mark finding as false positive |
|
|
|
|
**Queue Pattern:**
|
|
- Publishes to Valkey Stream: `scanner:jobs`
|
|
- Scanner.Worker consumes via `XREADGROUP`
|
|
|
|
#### Scanner.Worker
|
|
|
|
**Database:** `scanner` schema (read EPSS, write inventory)
|
|
**Object Storage:** RustFS `scanner-artifacts`, `surface-cache`
|
|
**Dependencies:** Scanner.WebService (internal API), RustFS (upload)
|
|
|
|
**Responsibilities:**
|
|
- **Image Pull:** OCI image download and layer extraction
|
|
- **Layer Analysis:** Runs OS/language/native analyzers per layer
|
|
- **SBOM Generation:** Per-layer SBOMs in SPDX 3.0.1 format
|
|
- **Composition:** Merges layer SBOMs into final composed SBOM
|
|
- **Reachability Analysis:** Call-graph extraction for Java/Node/Go/Python
|
|
- **Artifact Upload:** Uploads SBOMs to RustFS
|
|
- **Progress Reporting:** Heartbeat to Scanner.WebService every 10s
|
|
|
|
**Analyzers:**
|
|
|
|
| Analyzer | Ecosystem | Method |
|
|
|----------|-----------|--------|
|
|
| `distro-debian` | Debian/Ubuntu | `dpkg-query`, `apt-cache` |
|
|
| `distro-rpm` | RHEL/Fedora/CentOS | RPM database |
|
|
| `distro-alpine` | Alpine Linux | APK database |
|
|
| `lang-java` | Java/Maven/Gradle | JAR manifests, `pom.xml`, `build.gradle` |
|
|
| `lang-node` | Node.js/npm | `package.json`, `package-lock.json` |
|
|
| `lang-python` | Python/pip | `requirements.txt`, `Pipfile`, wheel metadata |
|
|
| `lang-go` | Golang | `go.mod`, binary parsing |
|
|
| `native` | C/C++ | ELF symbol tables, version symbols |
|
|
|
|
**Reachability:**
|
|
- Builds call graph (CG) with nodes/edges in PostgreSQL `cg_node`, `cg_edge`
|
|
- Determines if vulnerable functions are callable from entrypoints
|
|
- Flags findings as `REACHABLE`/`UNREACHABLE`/`UNKNOWN`
|
|
|
|
---
|
|
|
|
### Advisory Ingestion
|
|
|
|
#### Concelier.WebService
|
|
|
|
**Port:** 8445 (HTTP)
|
|
**Database:** `vuln` schema (`advisory_raw`, `linksets`)
|
|
**Dependencies:** Scheduler (webhook for delta events), Upstream sources (connectors)
|
|
|
|
**Responsibilities:**
|
|
- **Advisory Ingestion:** Fetches vulnerabilities from NVD, Red Hat, Debian, Ubuntu, GitHub, etc.
|
|
- **Normalization:** Converts vendor formats to canonical Concelier advisory JSON
|
|
- **Linkset Computation:** Maps CVE IDs to PURLs/CPEs with version ranges
|
|
- **AOC Enforcement:** Append-only writes to `advisory_raw` (immutable after insert)
|
|
- **Delta Detection:** Detects new advisories and emits webhook to Scheduler
|
|
- **Merge Engine:** Deduplicates advisories across sources with priority rules
|
|
|
|
**Connectors:**
|
|
|
|
| Connector | Source | Update Frequency |
|
|
|-----------|--------|------------------|
|
|
| `nvd` | NVD CVE JSON | Hourly |
|
|
| `redhat` | Red Hat OVAL | Every 6 hours |
|
|
| `debian` | Debian Security Tracker | Every 6 hours |
|
|
| `ubuntu` | Ubuntu CVE Tracker | Every 6 hours |
|
|
| `github` | GitHub Advisory Database | Hourly |
|
|
| `alpine` | Alpine SecDB | Every 6 hours |
|
|
| `osv` | OSV.dev | Hourly |
|
|
|
|
**Linkset API:**
|
|
- `/v1/lnm/linksets/{advisoryId}` - Returns PURL/CPE mappings for a CVE
|
|
- Consumed by Scanner for enrichment
|
|
|
|
**PostgreSQL Logical Replication:**
|
|
- `advisory_raw_stream` → Policy Engine (tenant-scoped replication)
|
|
|
|
#### Concelier.Worker
|
|
|
|
**Dependencies:** Concelier.WebService (internal API), Upstream advisory sources
|
|
|
|
**Responsibilities:**
|
|
- **Scheduled Fetching:** Polls connectors on cron schedules
|
|
- **Delta Computation:** Compares fetched data with last snapshot
|
|
- **Advisory Normalization:** Parses OVAL, JSON, XML into canonical format
|
|
- **Database Insert:** Writes to `advisory_raw` via Concelier.WebService API
|
|
|
|
---
|
|
|
|
### VEX Ingestion
|
|
|
|
#### Excititor.WebService
|
|
|
|
**Port:** 8446 (HTTP)
|
|
**Database:** `vex` schema (`vex_raw`, `consensus`)
|
|
**Dependencies:** IssuerDirectory (trust verification), Scheduler (webhook for delta events)
|
|
|
|
**Responsibilities:**
|
|
- **VEX Ingestion:** Fetches OpenVEX and CSAF VEX documents from vendors
|
|
- **DSSE Verification:** Validates in-toto signatures on VEX statements
|
|
- **Trust Scoring:** Applies trust weights to issuers from IssuerDirectory
|
|
- **Consensus Computation:** Resolves conflicts when multiple VEX statements conflict
|
|
- **AOC Enforcement:** Append-only writes to `vex_raw` (immutable after insert)
|
|
- **Delta Detection:** Detects new VEX statements and emits webhook to Scheduler
|
|
|
|
**VEX Sources:**
|
|
|
|
| Source | Format | Signature |
|
|
|--------|--------|-----------|
|
|
| Red Hat VEX | CSAF VEX | PGP-signed |
|
|
| CISA VEX | OpenVEX | DSSE in-toto |
|
|
| Vendor VEX | OpenVEX | DSSE in-toto |
|
|
|
|
**Consensus Algorithm:**
|
|
- Weighted voting based on issuer trust scores
|
|
- Tie-breaking: Most conservative status wins (e.g., `affected` > `not_affected`)
|
|
- Result stored in `consensus` table with provenance
|
|
|
|
**PostgreSQL Logical Replication:**
|
|
- `vex_raw_stream` → Policy Engine (tenant-scoped replication)
|
|
|
|
#### Excititor.Worker
|
|
|
|
**Dependencies:** Excititor.WebService (internal API), IssuerDirectory (trust lookup)
|
|
|
|
**Responsibilities:**
|
|
- **Scheduled Fetching:** Polls VEX sources on cron schedules
|
|
- **Signature Verification:** Validates DSSE envelopes via IssuerDirectory
|
|
- **Trust Verification:** Checks issuer is in trusted list
|
|
- **Database Insert:** Writes to `vex_raw` via Excititor.WebService API
|
|
|
|
---
|
|
|
|
### Policy Engine
|
|
|
|
#### Policy.Gateway
|
|
|
|
**Port:** 8447 (HTTP)
|
|
**Database:** `policy` schema (`exception_objects`, `snapshots`, `unknowns`)
|
|
**Dependencies:** Policy Engine (OPA/Rego), Authority (auth)
|
|
|
|
**Responsibilities:**
|
|
- **Policy Evaluation Gateway:** Proxies requests to OPA/Rego engine
|
|
- **Exception Management:** Stores approved false positives, waivers
|
|
- **Approval Workflows:** Multi-stage approval for policy exceptions
|
|
- **Delta Computation:** Compares baseline vs. current scan for policy drift
|
|
- **Unknowns Tracking:** Records unresolved CVEs (no fix available)
|
|
|
|
**API Endpoints:**
|
|
|
|
| Endpoint | Method | Purpose |
|
|
|----------|--------|---------|
|
|
| `/v1/policy/evaluate` | POST | Evaluate policy against scan results |
|
|
| `/v1/policy/exceptions` | POST | Create exception request |
|
|
| `/v1/policy/exceptions/{id}/approve` | POST | Approve exception |
|
|
| `/v1/policy/unknowns` | GET | List unresolved findings |
|
|
|
|
**Policy Data Sources:**
|
|
- PostgreSQL logical replication from `advisory_raw_stream`, `vex_raw_stream`
|
|
- Real-time advisory and VEX data for policy eval
|
|
|
|
#### Policy Engine (OPA/Rego)
|
|
|
|
**Container:** Separate OPA container, called via HTTP by Policy.Gateway
|
|
**Language:** Rego policies
|
|
**Data Sources:** PostgreSQL logical replication streams
|
|
|
|
**Policies:**
|
|
- `unknowns-budget.rego` - Limits unresolved CVEs (no fix available)
|
|
- `severity-gates.rego` - Blocks based on CVSS severity
|
|
- `reachability-gates.rego` - Allows unreachable findings
|
|
- `vex-override.rego` - Applies VEX `not_affected` status
|
|
|
|
---
|
|
|
|
### Orchestration & Scheduling
|
|
|
|
#### Scheduler.WebService
|
|
|
|
**Port:** 8448 (HTTP)
|
|
**Database:** `scheduler` schema (`graph_jobs`, `runs`, `schedules`, `impact_snapshots`)
|
|
**Dependencies:** Scanner (re-scan requests), Cartographer (export notifications, optional)
|
|
|
|
**Responsibilities:**
|
|
- **Impact Selection:** When advisories/VEX change, identifies affected images via BOM-Index
|
|
- **Re-scan Orchestration:** Enqueues re-scan jobs to Scanner.WebService
|
|
- **Rate Limiting:** Enforces max concurrent scans, maintenance windows
|
|
- **Schedule Management:** Manages periodic scan schedules (cron)
|
|
- **Webhook Ingestion:** Receives delta events from Concelier, Excititor
|
|
|
|
**Webhook Endpoints:**
|
|
|
|
| Endpoint | Source | Payload |
|
|
|----------|--------|---------|
|
|
| `/webhooks/concelier` | Concelier | Advisory delta event |
|
|
| `/webhooks/excititor` | Excititor | VEX delta event |
|
|
|
|
**Impact Selection Algorithm:**
|
|
1. Receive advisory delta (CVE IDs added)
|
|
2. Query BOM-Index for images containing affected PURLs
|
|
3. Batch impacted images (max 100 per run)
|
|
4. Enforce rate limits and maintenance windows
|
|
5. Enqueue re-scans to Scanner.WebService
|
|
|
|
#### Scheduler.Worker
|
|
|
|
**Dependencies:** Scheduler.WebService (internal API), Scanner.WebService (HTTP)
|
|
|
|
**Responsibilities:**
|
|
- **Job Execution:** Claims jobs from Scheduler.WebService
|
|
- **Batch Processing:** Processes impacted image batches
|
|
- **Re-scan Trigger:** HTTP POST to Scanner `/v1/scans` with `rescan=true`
|
|
- **Progress Reporting:** Heartbeat to Scheduler every 10s
|
|
|
|
#### Orchestrator.WebService
|
|
|
|
**Port:** 8449 (HTTP)
|
|
**Database:** `orchestrator` schema (`sources`, `runs`, `jobs`, `dags`, `pack_runs`)
|
|
|
|
**Responsibilities:**
|
|
- **DAG Workflows:** Manages directed acyclic graph job dependencies
|
|
- **Pack Runs:** Bundles multiple jobs into atomic runs
|
|
- **Job Streaming:** WebSocket endpoints for real-time job status
|
|
- **Worker Coordination:** Job claim, heartbeat, completion tracking
|
|
|
|
**Use Cases:**
|
|
- Complex multi-step workflows (e.g., scan → policy → VEX → attest)
|
|
- Batch operations (e.g., scan all images in namespace)
|
|
|
|
---
|
|
|
|
### Notification Engine
|
|
|
|
#### Notify.WebService
|
|
|
|
**Port:** 8450 (HTTP)
|
|
**Database:** `notify` schema (`channels`, `templates`, `delivery_history`, `digest_state`)
|
|
**Dependencies:** Valkey (delivery queue), Scanner (event subscription)
|
|
|
|
**Responsibilities:**
|
|
- **Channel Management:** Configures Slack, Teams, Email, Webhook channels
|
|
- **Template Engine:** Renders notification templates with Liquid syntax
|
|
- **Throttling:** Rate limits notifications (max N per hour per channel)
|
|
- **Digest Mode:** Batches notifications into hourly/daily digests
|
|
- **Event Subscription:** Subscribes to `report.ready` events from Scanner
|
|
|
|
**Channel Types:**
|
|
|
|
| Channel | Protocol | Configuration |
|
|
|---------|----------|---------------|
|
|
| Slack | HTTP (Slack API) | Bot token, channel ID |
|
|
| Teams | HTTP (webhook) | Webhook URL |
|
|
| Email | SMTP | SMTP server, credentials |
|
|
| Webhook | HTTP | URL, auth headers |
|
|
|
|
**Delivery Queue:**
|
|
- Publishes to Valkey Stream: `notify:delivery`
|
|
- Notify.Worker consumes via `XREADGROUP`
|
|
|
|
#### Notify.Worker
|
|
|
|
**Dependencies:** Notify.WebService (internal API), External services (Slack/Teams/SMTP)
|
|
|
|
**Responsibilities:**
|
|
- **Job Claim:** Claims delivery jobs from Valkey queue
|
|
- **Template Rendering:** Renders Liquid templates with event data
|
|
- **Delivery Execution:** HTTP/SMTP delivery with retries (exponential backoff)
|
|
- **Idempotency:** Tracks delivery IDs to prevent duplicates
|
|
- **SLO Tracking:** Records delivery latency for P95 monitoring
|
|
|
|
**Retry Policy:**
|
|
- Max 3 retries
|
|
- Backoff: 1s, 5s, 15s
|
|
- Dead-letter queue after exhaustion
|
|
|
|
---
|
|
|
|
### Cryptographic Services
|
|
|
|
#### Signer.WebService
|
|
|
|
**Port:** 8441 (HTTPS with mTLS)
|
|
**Dependencies:** Authority (PoE validation), External KMS (AWS/GCP/PKCS11), OCI Registry (digest verification)
|
|
|
|
**Responsibilities:**
|
|
- **DSSE Signing:** Signs in-toto envelopes (SBOMs, VEX, attestations)
|
|
- **PoE Validation:** License check via Authority introspection
|
|
- **Scanner Authenticity:** Verifies scanner image digest is Stella Ops-signed
|
|
- **Multi-Profile Keys:** FIPS, GOST, SM for regulatory compliance
|
|
- **Key Rotation:** Automated key rotation with overlap period
|
|
|
|
**Hard Gates:**
|
|
1. OpTok validation (DPoP + mTLS)
|
|
2. PoE license check (fails if expired)
|
|
3. Scanner image digest verification (must be cosign-signed by Stella Ops)
|
|
|
|
**Key Storage:**
|
|
|
|
| Storage | Use Case |
|
|
|---------|----------|
|
|
| In-memory | Development |
|
|
| PKCS11 HSM | On-prem production |
|
|
| AWS KMS | AWS cloud deployments |
|
|
| GCP KMS | GCP cloud deployments |
|
|
|
|
#### Attestor.WebService
|
|
|
|
**Port:** 8442 (HTTPS)
|
|
**Dependencies:** Signer (DSSE signing), Rekor v2 (transparency log)
|
|
|
|
**Responsibilities:**
|
|
- **Rekor Submission:** Posts DSSE bundles to Sigstore Rekor v2
|
|
- **Receipt Retrieval:** Fetches inclusion proofs (Merkle tree path)
|
|
- **Offline Bundles:** Packages DSSE + Rekor receipt for airgap verification
|
|
- **Verification API:** Validates Rekor receipts for CLI/CI
|
|
|
|
**Workflow:**
|
|
1. Receive DSSE envelope from Scanner/Excititor
|
|
2. Call Signer for signature (mTLS with OpTok)
|
|
3. Submit signed DSSE to Rekor v2 (`/api/v2/entries`)
|
|
4. Retrieve inclusion proof from Rekor
|
|
5. Return proof bundle to caller
|
|
|
|
**Offline Mode:**
|
|
- When `OFFLINEKIT_ENABLED=true`:
|
|
- Uses local timestamp service (no Rekor)
|
|
- Bundles DSSE + TSA timestamp + trust anchors
|
|
- Suitable for airgap deployments
|
|
|
|
---
|
|
|
|
### Supporting Services
|
|
|
|
#### IssuerDirectory.WebService
|
|
|
|
**Port:** 8451 (HTTP)
|
|
**Database:** None (read-only configuration)
|
|
|
|
**Responsibilities:**
|
|
- **Trusted Issuer Registry:** Maintains list of authorized VEX/SBOM signers
|
|
- **Trust Weights:** Assigns numerical trust scores (0.0 - 1.0) to issuers
|
|
- **Seed Data:** CSAF trusted providers from official lists
|
|
|
|
**Issuer Manifest:**
|
|
```json
|
|
{
|
|
"issuers": [
|
|
{
|
|
"id": "redhat",
|
|
"name": "Red Hat Product Security",
|
|
"publicKey": "-----BEGIN PUBLIC KEY-----...",
|
|
"trustWeight": 0.95
|
|
},
|
|
{
|
|
"id": "cisa",
|
|
"name": "CISA Cybersecurity",
|
|
"publicKey": "-----BEGIN PUBLIC KEY-----...",
|
|
"trustWeight": 1.0
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**API:**
|
|
- `/v1/issuers` - List all trusted issuers
|
|
- `/v1/issuers/{id}` - Get issuer details
|
|
|
|
---
|
|
|
|
## Communication Patterns
|
|
|
|
### 1. Scan Request Flow
|
|
|
|
```
|
|
CLI/UI
|
|
│
|
|
│ POST /v1/scans
|
|
│ { "imageRef": "alpine:latest" }
|
|
▼
|
|
Gateway.WebService
|
|
│
|
|
│ 1. Validate JWT + DPoP
|
|
│ 2. Check rate limits
|
|
│ 3. Route to Scanner
|
|
▼
|
|
Scanner.WebService
|
|
│
|
|
│ 1. Create scan record in PostgreSQL
|
|
│ 2. XADD to Valkey: scanner:jobs
|
|
│
|
|
│ ◄──────────────────────────┐
|
|
│ │
|
|
▼ │
|
|
Valkey Stream │
|
|
scanner:jobs │
|
|
│ │
|
|
│ XREADGROUP (consumer group) │
|
|
▼ │
|
|
Scanner.Worker │
|
|
│ │
|
|
│ 1. Pull OCI image │
|
|
│ 2. Extract layers │
|
|
│ 3. Run analyzers │
|
|
│ 4. Generate SBOMs │
|
|
│ 5. Upload to RustFS │
|
|
│ 6. Query Concelier for linksets
|
|
│ └─► HTTP GET /v1/lnm/linksets/{cveId}
|
|
│ │
|
|
│ 7. Heartbeat ──────────────┘
|
|
│ POST /internal/jobs/{id}/heartbeat
|
|
│
|
|
│ 8. Complete
|
|
│ POST /internal/jobs/{id}/complete
|
|
▼
|
|
Scanner.WebService
|
|
│
|
|
│ 1. Update scan record (status=completed)
|
|
│ 2. Call Policy.Gateway for verdict
|
|
│ POST /v1/policy/evaluate
|
|
│ └─► Policy.Gateway
|
|
│ └─► Policy Engine (OPA)
|
|
│ └─► PostgreSQL (advisory_raw_stream)
|
|
│
|
|
│ 3. Call Signer for DSSE signature
|
|
│ POST /v1/sign (mTLS + OpTok)
|
|
│ └─► Signer.WebService
|
|
│ ├─► Validate PoE (license)
|
|
│ ├─► Verify scanner digest (cosign)
|
|
│ └─► Sign DSSE envelope
|
|
│
|
|
│ 4. Call Attestor for Rekor submission
|
|
│ POST /v1/attest
|
|
│ └─► Attestor.WebService
|
|
│ ├─► Submit to Rekor v2
|
|
│ └─► Retrieve inclusion proof
|
|
│
|
|
│ 5. Store proof bundle in RustFS
|
|
│ 6. XADD to Valkey: events:report.ready
|
|
│
|
|
▼
|
|
Valkey Stream
|
|
events:report.ready
|
|
│
|
|
│ XREADGROUP
|
|
▼
|
|
Notify.WebService
|
|
│
|
|
│ 1. Render template
|
|
│ 2. XADD to Valkey: notify:delivery
|
|
│
|
|
▼
|
|
Valkey Stream
|
|
notify:delivery
|
|
│
|
|
│ XREADGROUP
|
|
▼
|
|
Notify.Worker
|
|
│
|
|
│ 1. Claim delivery job
|
|
│ 2. HTTP POST to Slack API
|
|
│ 3. Mark complete
|
|
▼
|
|
Slack Channel
|
|
```
|
|
|
|
**Communication Summary:**
|
|
1. **Gateway → Scanner:** HTTP POST (JWT + DPoP auth)
|
|
2. **Scanner → Valkey:** XADD (queue job)
|
|
3. **Worker → Valkey:** XREADGROUP (consume job)
|
|
4. **Worker → Scanner:** HTTP POST (heartbeat, completion)
|
|
5. **Worker → Concelier:** HTTP GET (linkset query)
|
|
6. **Scanner → Policy:** HTTP POST (policy eval)
|
|
7. **Scanner → Signer:** HTTP POST mTLS (DSSE signing)
|
|
8. **Scanner → Attestor:** HTTP POST (Rekor submission)
|
|
9. **Scanner → Valkey:** XADD (event publish)
|
|
10. **Notify → Valkey:** XREADGROUP (event consume)
|
|
11. **Notify Worker → Slack:** HTTP POST (delivery)
|
|
|
|
---
|
|
|
|
### 2. Advisory Update Flow
|
|
|
|
```
|
|
Concelier.Worker (cron: every hour)
|
|
│
|
|
│ 1. Fetch NVD CVE JSON feed
|
|
│ HTTPS GET https://services.nvd.nist.gov/rest/json/cves/2.0
|
|
│
|
|
│ 2. Parse and normalize
|
|
│ 3. POST to Concelier.WebService
|
|
│ POST /internal/ingest
|
|
▼
|
|
Concelier.WebService
|
|
│
|
|
│ 1. Validate advisory format
|
|
│ 2. Compute linksets (CVE → PURL/CPE)
|
|
│ 3. INSERT INTO vuln.advisory_raw (AOC: append-only)
|
|
│ 4. Detect delta (new CVEs)
|
|
│ 5. Webhook POST to Scheduler
|
|
│ POST /webhooks/concelier
|
|
│ {
|
|
│ "cveIds": ["CVE-2024-1234"],
|
|
│ "timestamp": "2025-12-23T12:00:00Z"
|
|
│ }
|
|
│
|
|
▼
|
|
Scheduler.WebService
|
|
│
|
|
│ 1. Query BOM-Index for impacted images
|
|
│ SELECT DISTINCT image_ref
|
|
│ FROM scanner.inventory
|
|
│ WHERE purl IN (
|
|
│ SELECT purl FROM vuln.linksets
|
|
│ WHERE cve_id IN ('CVE-2024-1234')
|
|
│ )
|
|
│
|
|
│ 2. Batch results (max 100 images/run)
|
|
│ 3. Enforce rate limits (max 10 scans/min)
|
|
│ 4. Enqueue to Scheduler.Worker
|
|
│
|
|
▼
|
|
Scheduler.Worker
|
|
│
|
|
│ 1. Claim job from Scheduler
|
|
│ 2. For each image:
|
|
│ POST /v1/scans
|
|
│ {
|
|
│ "imageRef": "alpine:latest",
|
|
│ "rescan": true,
|
|
│ "reason": "advisory-delta"
|
|
│ }
|
|
│ └─► Scanner.WebService
|
|
│ └─► [Standard scan flow]
|
|
│
|
|
│ 3. Heartbeat to Scheduler
|
|
│ 4. Complete job
|
|
▼
|
|
Scanner.WebService
|
|
(Re-scan executes, new report generated)
|
|
```
|
|
|
|
**Communication Summary:**
|
|
1. **Concelier.Worker → NVD:** HTTPS GET (fetch advisories)
|
|
2. **Concelier.Worker → Concelier.Web:** HTTP POST (ingest)
|
|
3. **Concelier.Web → PostgreSQL:** INSERT (advisory storage)
|
|
4. **Concelier.Web → Scheduler:** HTTP POST webhook (delta event)
|
|
5. **Scheduler → PostgreSQL:** SELECT (BOM-Index query for impacted images)
|
|
6. **Scheduler.Worker → Scanner:** HTTP POST (re-scan requests)
|
|
|
|
---
|
|
|
|
### 3. VEX Update Flow
|
|
|
|
```
|
|
Excititor.Worker (cron: every 6 hours)
|
|
│
|
|
│ 1. Fetch Red Hat CSAF VEX feed
|
|
│ HTTPS GET https://www.redhat.com/security/data/csaf/
|
|
│
|
|
│ 2. Parse CSAF JSON
|
|
│ 3. Verify PGP signature
|
|
│ 4. POST to Excititor.WebService
|
|
│ POST /internal/ingest
|
|
▼
|
|
Excititor.WebService
|
|
│
|
|
│ 1. Verify DSSE signature
|
|
│ └─► IssuerDirectory.WebService
|
|
│ GET /v1/issuers/{issuerId}
|
|
│ (Retrieve public key + trust weight)
|
|
│
|
|
│ 2. Validate signature with issuer public key
|
|
│ 3. INSERT INTO vex.vex_raw (AOC: append-only)
|
|
│ 4. Compute consensus (if multiple VEX for same CVE)
|
|
│ UPDATE vex.consensus
|
|
│ 5. Detect delta (new VEX statements)
|
|
│ 6. Webhook POST to Scheduler
|
|
│ POST /webhooks/excititor
|
|
│ {
|
|
│ "cveIds": ["CVE-2024-5678"],
|
|
│ "status": "not_affected",
|
|
│ "timestamp": "2025-12-23T18:00:00Z"
|
|
│ }
|
|
│
|
|
▼
|
|
Scheduler.WebService
|
|
│
|
|
│ 1. Query BOM-Index for impacted images
|
|
│ (Same as advisory flow, but for VEX changes)
|
|
│
|
|
│ 2. Enqueue analysis-only jobs
|
|
│ (No full re-scan, just re-evaluate policy with new VEX)
|
|
│
|
|
▼
|
|
Scheduler.Worker
|
|
│
|
|
│ For each image:
|
|
│ POST /v1/scans/{scanId}/reanalyze
|
|
│ └─► Scanner.WebService
|
|
│ └─► Policy.Gateway (re-evaluate with new VEX)
|
|
▼
|
|
Scanner.WebService
|
|
(Policy re-evaluation, verdict updated)
|
|
```
|
|
|
|
**Communication Summary:**
|
|
1. **Excititor.Worker → VEX source:** HTTPS GET (fetch VEX)
|
|
2. **Excititor.Worker → Excititor.Web:** HTTP POST (ingest)
|
|
3. **Excititor.Web → IssuerDirectory:** HTTP GET (trust verification)
|
|
4. **Excititor.Web → PostgreSQL:** INSERT (VEX storage)
|
|
5. **Excititor.Web → Scheduler:** HTTP POST webhook (delta event)
|
|
6. **Scheduler.Worker → Scanner:** HTTP POST (re-analyze request)
|
|
|
|
---
|
|
|
|
### 4. Notification Delivery Flow
|
|
|
|
```
|
|
Scanner.WebService
|
|
(Scan completed, verdict computed)
|
|
│
|
|
│ XADD to Valkey Stream
|
|
│ events:report.ready
|
|
│ {
|
|
│ "scanId": "scan-123",
|
|
│ "imageRef": "alpine:latest",
|
|
│ "verdict": "FAIL",
|
|
│ "criticalCount": 3
|
|
│ }
|
|
▼
|
|
Valkey Stream: events:report.ready
|
|
│
|
|
│ XREADGROUP (consumer group: notify-delivery)
|
|
▼
|
|
Notify.WebService
|
|
│
|
|
│ 1. SELECT channel config from PostgreSQL
|
|
│ (Slack, Teams, Email channels)
|
|
│
|
|
│ 2. SELECT template from PostgreSQL
|
|
│ (Liquid template: "New vulnerabilities found...")
|
|
│
|
|
│ 3. Check throttle limits
|
|
│ (Max 10 notifications/hour per channel)
|
|
│
|
|
│ 4. Render template with event data
|
|
│
|
|
│ 5. XADD to Valkey Stream
|
|
│ notify:delivery
|
|
│ {
|
|
│ "channelId": "slack-security",
|
|
│ "renderedMessage": "🚨 Critical: 3 vulns in alpine:latest",
|
|
│ "deliveryId": "delivery-456"
|
|
│ }
|
|
▼
|
|
Valkey Stream: notify:delivery
|
|
│
|
|
│ XREADGROUP (consumer group: notify-workers)
|
|
▼
|
|
Notify.Worker
|
|
│
|
|
│ 1. Claim delivery job
|
|
│ 2. Check idempotency (deliveryId seen before?)
|
|
│ 3. HTTP POST to Slack API
|
|
│ POST https://slack.com/api/chat.postMessage
|
|
│ {
|
|
│ "channel": "#security",
|
|
│ "text": "🚨 Critical: 3 vulns in alpine:latest",
|
|
│ "attachments": [...]
|
|
│ }
|
|
│
|
|
│ 4. Record delivery in PostgreSQL
|
|
│ INSERT INTO notify.delivery_history
|
|
│ 5. XACK to Valkey (mark complete)
|
|
▼
|
|
Slack Channel #security
|
|
```
|
|
|
|
**Communication Summary:**
|
|
1. **Scanner → Valkey:** XADD (publish event)
|
|
2. **Notify.Web → Valkey:** XREADGROUP (consume event)
|
|
3. **Notify.Web → PostgreSQL:** SELECT (channel config, template)
|
|
4. **Notify.Web → Valkey:** XADD (queue delivery job)
|
|
5. **Notify.Worker → Valkey:** XREADGROUP (consume delivery job)
|
|
6. **Notify.Worker → Slack API:** HTTP POST (deliver notification)
|
|
7. **Notify.Worker → PostgreSQL:** INSERT (delivery history)
|
|
8. **Notify.Worker → Valkey:** XACK (acknowledge completion)
|
|
|
|
---
|
|
|
|
### 5. Policy Evaluation Flow
|
|
|
|
```
|
|
Scanner.WebService
|
|
(Scan completed, SBOM generated)
|
|
│
|
|
│ POST /v1/policy/evaluate
|
|
│ {
|
|
│ "scanId": "scan-123",
|
|
│ "findings": [
|
|
│ {
|
|
│ "cveId": "CVE-2024-1234",
|
|
│ "purl": "pkg:alpine/openssl@3.0.1",
|
|
│ "severity": "CRITICAL",
|
|
│ "reachability": "REACHABLE"
|
|
│ }
|
|
│ ]
|
|
│ }
|
|
▼
|
|
Policy.Gateway
|
|
│
|
|
│ 1. SELECT exceptions from PostgreSQL
|
|
│ (Check for approved false positives)
|
|
│
|
|
│ 2. POST /v1/policy/eval to Policy Engine
|
|
│ └─► Policy Engine (OPA/Rego)
|
|
│ │
|
|
│ │ Data sources:
|
|
│ ├─► PostgreSQL logical replication
|
|
│ │ • advisory_raw_stream (advisory data)
|
|
│ │ • vex_raw_stream (VEX data)
|
|
│ │
|
|
│ │ Policy rules:
|
|
│ ├─► unknowns-budget.rego
|
|
│ │ (Limit unresolved CVEs to max 10)
|
|
│ ├─► severity-gates.rego
|
|
│ │ (Block CRITICAL, allow HIGH with approval)
|
|
│ ├─► reachability-gates.rego
|
|
│ │ (Allow UNREACHABLE findings)
|
|
│ └─► vex-override.rego
|
|
│ (If VEX status = not_affected, allow)
|
|
│
|
|
│ 3. Return verdict
|
|
│ {
|
|
│ "verdict": "FAIL",
|
|
│ "blockedFindings": [
|
|
│ {
|
|
│ "cveId": "CVE-2024-1234",
|
|
│ "reason": "CRITICAL severity + REACHABLE"
|
|
│ }
|
|
│ ],
|
|
│ "allowedFindings": [
|
|
│ {
|
|
│ "cveId": "CVE-2024-5678",
|
|
│ "reason": "VEX not_affected"
|
|
│ }
|
|
│ ]
|
|
│ }
|
|
▼
|
|
Scanner.WebService
|
|
│
|
|
│ 1. Store verdict in PostgreSQL
|
|
│ UPDATE scanner.scan_manifests
|
|
│ SET verdict = 'FAIL'
|
|
│ 2. Return to caller
|
|
▼
|
|
CLI/UI
|
|
```
|
|
|
|
**Communication Summary:**
|
|
1. **Scanner → Policy.Gateway:** HTTP POST (policy eval request)
|
|
2. **Policy.Gateway → PostgreSQL:** SELECT (exceptions)
|
|
3. **Policy.Gateway → Policy Engine:** HTTP POST (OPA eval)
|
|
4. **Policy Engine → PostgreSQL:** Logical replication read (advisory/VEX data)
|
|
5. **Policy.Gateway → Scanner:** HTTP 200 (verdict response)
|
|
6. **Scanner → PostgreSQL:** UPDATE (store verdict)
|
|
|
|
---
|
|
|
|
## Database Schema Isolation
|
|
|
|
Each service has a dedicated PostgreSQL schema for strict isolation:
|
|
|
|
### authority
|
|
|
|
**Owner:** Authority.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `users` | User accounts (LDAP-synced or local) |
|
|
| `clients` | OAuth2 clients (service accounts) |
|
|
| `tenants` | Multi-tenant organization data |
|
|
| `keys` | Signing keys (JWK format) |
|
|
| `tokens` | OpTok refresh tokens (rotation-protected) |
|
|
| `audit_log` | Authentication/authorization events |
|
|
| `dpop_nonces` | (Migrated to Valkey for performance) |
|
|
|
|
**Indexes:**
|
|
- `users.email` (unique)
|
|
- `clients.client_id` (unique)
|
|
- `tenants.slug` (unique)
|
|
- `audit_log.timestamp, tenant_id` (composite)
|
|
|
|
### scanner
|
|
|
|
**Owner:** Scanner.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `scan_manifests` | Scan metadata, status, verdicts |
|
|
| `proof_bundles` | DSSE envelopes + Rekor receipts |
|
|
| `triage` | False positives, waiver approvals |
|
|
| `epss` | EPSS scores (daily refresh from FIRST.org) |
|
|
| `cg_node` | Call graph nodes (functions) |
|
|
| `cg_edge` | Call graph edges (function calls) |
|
|
| `inventory` | Package inventory (PURL → scan mapping) |
|
|
|
|
**Indexes:**
|
|
- `scan_manifests.image_ref, created_at` (composite)
|
|
- `inventory.purl` (GIN index for LIKE queries)
|
|
- `cg_node.function_signature` (unique)
|
|
- `cg_edge.source_id, target_id` (composite)
|
|
|
|
### vuln
|
|
|
|
**Owner:** Concelier.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `advisory_raw` | Immutable advisory documents (AOC) |
|
|
| `linksets` | CVE → PURL/CPE mappings with version ranges |
|
|
| `observations` | Merge conflicts, priority overrides |
|
|
|
|
**Logical Replication:**
|
|
- `advisory_raw_stream` → Policy Engine (tenant-scoped)
|
|
|
|
**Indexes:**
|
|
- `advisory_raw.cve_id` (GIN array index)
|
|
- `linksets.cve_id, purl` (composite)
|
|
|
|
### vex
|
|
|
|
**Owner:** Excititor.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `vex_raw` | Immutable VEX statements (AOC) |
|
|
| `consensus` | Resolved VEX status (weighted voting) |
|
|
| `provider_state` | Last-fetch timestamps per VEX source |
|
|
|
|
**Logical Replication:**
|
|
- `vex_raw_stream` → Policy Engine (tenant-scoped)
|
|
|
|
**Indexes:**
|
|
- `vex_raw.cve_id, issuer_id` (composite)
|
|
- `consensus.cve_id` (unique)
|
|
|
|
### scheduler
|
|
|
|
**Owner:** Scheduler.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `graph_jobs` | Re-scan job definitions (advisory/VEX delta) |
|
|
| `runs` | Job run instances (status, progress) |
|
|
| `schedules` | Cron schedules for periodic scans |
|
|
| `impact_snapshots` | BOM-Index query results (cached) |
|
|
|
|
**Indexes:**
|
|
- `runs.job_id, created_at` (composite)
|
|
- `impact_snapshots.cve_id` (GIN array index)
|
|
|
|
### notify
|
|
|
|
**Owner:** Notify.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `channels` | Slack, Teams, Email, Webhook configs |
|
|
| `templates` | Liquid templates for notifications |
|
|
| `delivery_history` | Sent notifications (idempotency, SLO tracking) |
|
|
| `digest_state` | Digest accumulation (hourly/daily batches) |
|
|
|
|
**Indexes:**
|
|
- `delivery_history.delivery_id` (unique)
|
|
- `delivery_history.channel_id, created_at` (composite)
|
|
|
|
### policy
|
|
|
|
**Owner:** Policy.Gateway
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `exception_objects` | Approved false positives, waivers |
|
|
| `snapshots` | Policy baseline snapshots for delta |
|
|
| `unknowns` | Unresolved CVEs (no fix available) |
|
|
|
|
**Indexes:**
|
|
- `exception_objects.cve_id, image_ref` (composite)
|
|
- `unknowns.cve_id` (unique)
|
|
|
|
### orchestrator
|
|
|
|
**Owner:** Orchestrator.WebService
|
|
|
|
**Tables:**
|
|
|
|
| Table | Purpose |
|
|
|-------|---------|
|
|
| `sources` | Job sources (Git repos, webhooks) |
|
|
| `runs` | Orchestrated run instances |
|
|
| `jobs` | Individual jobs within runs |
|
|
| `dags` | Job dependency graphs |
|
|
| `pack_runs` | Atomic multi-job bundles |
|
|
|
|
**Indexes:**
|
|
- `jobs.run_id, status` (composite)
|
|
- `dags.parent_job_id, child_job_id` (composite)
|
|
|
|
---
|
|
|
|
## Security Boundaries
|
|
|
|
### Authentication & Authorization
|
|
|
|
**All services** enforce:
|
|
1. **JWT Validation:** OpTok signature verification (RS256/ES256)
|
|
2. **DPoP Verification:** Sender constraint validation (RFC 9449)
|
|
3. **Scope-Based Access:** RBAC claims in OpTok (`scan:read`, `policy:write`, etc.)
|
|
4. **Tenant Isolation:** All queries filtered by `tenant_id` from OpTok
|
|
|
|
**Authority Hard Gates:**
|
|
- DPoP nonce must be unused (30s TTL in Valkey)
|
|
- OpTok expiry < 15 minutes from issue
|
|
- mTLS certificate must match client_id
|
|
|
|
**Signer Hard Gates:**
|
|
- PoE (Proof of Entitlement) must be valid license
|
|
- Scanner image digest must be cosign-signed by Stella Ops
|
|
- OpTok must have `sign:dsse` scope
|
|
|
|
### Network Segmentation
|
|
|
|
**Production Deployment:**
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ PUBLIC INTERNET │
|
|
└──────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
│ HTTPS (TLS 1.3)
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ LOAD BALANCER / WAF │
|
|
│ • Rate limiting (IP-based) │
|
|
│ • DDoS protection │
|
|
│ • TLS termination │
|
|
└──────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
│ Internal HTTP
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ DMZ - Gateway Layer │
|
|
│ ┌────────────────────────────────────────┐ │
|
|
│ │ Gateway.WebService │ │
|
|
│ │ • JWT + DPoP validation │ │
|
|
│ │ • Tenant resolution │ │
|
|
│ └────────────────────────────────────────┘ │
|
|
└──────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
│ Internal mTLS (optional)
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ APPLICATION LAYER (Internal) │
|
|
│ • Scanner.WebService │
|
|
│ • Concelier.WebService │
|
|
│ • Policy.Gateway │
|
|
│ • Scheduler.WebService │
|
|
│ • Notify.WebService │
|
|
│ • Orchestrator.WebService │
|
|
│ │
|
|
│ Network Policy: Only Gateway can initiate connections │
|
|
└──────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
│ PostgreSQL protocol (TLS)
|
|
│ Valkey protocol (TLS optional)
|
|
│ S3 API (HTTPS)
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ DATA LAYER (Isolated Subnet) │
|
|
│ • PostgreSQL (private IP only) │
|
|
│ • Valkey (private IP only) │
|
|
│ • RustFS (private IP only) │
|
|
│ │
|
|
│ Network Policy: No outbound internet, inbound from app │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ PRIVILEGED SERVICES (Separate Subnet) │
|
|
│ • Authority (TLS 8440) │
|
|
│ • Signer (mTLS 8441) │
|
|
│ • Attestor (HTTPS 8442) │
|
|
│ │
|
|
│ Network Policy: mTLS required, audit all access │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Data Encryption
|
|
|
|
**At Rest:**
|
|
- PostgreSQL: Transparent Data Encryption (TDE) or LUKS full-disk
|
|
- Valkey: No encryption (ephemeral data only, 30s max TTL for DPoP nonces)
|
|
- RustFS: Server-side encryption (SSE-S3 or AES-256)
|
|
|
|
**In Transit:**
|
|
- External: TLS 1.3 (Gateway → clients)
|
|
- Internal: Optional mTLS (Gateway → services)
|
|
- PostgreSQL: TLS (required in production)
|
|
- Valkey: TLS optional (recommend enabled)
|
|
- RustFS: HTTPS (required)
|
|
|
|
### Audit Logging
|
|
|
|
**All services log to PostgreSQL:**
|
|
|
|
| Event | Service | Table |
|
|
|-------|---------|-------|
|
|
| Authentication | Authority | `authority.audit_log` |
|
|
| Authorization denials | Gateway | `authority.audit_log` |
|
|
| DSSE signing | Signer | `authority.audit_log` (via OpTok validation) |
|
|
| Policy exceptions | Policy.Gateway | `policy.exception_objects` (approval trail) |
|
|
| Scan triggers | Scanner | `scanner.scan_manifests` (audit columns) |
|
|
|
|
**Audit Trail Requirements (SOC 2):**
|
|
- Who (user/client ID)
|
|
- What (action performed)
|
|
- When (ISO 8601 timestamp)
|
|
- Where (tenant ID, IP address)
|
|
- Result (success/failure, reason)
|
|
|
|
**Retention:**
|
|
- Audit logs: 90 days minimum (configurable per tenant)
|
|
- Compliance mode: 7 years retention for regulated industries
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Key Architectural Principles:**
|
|
|
|
1. **Schema Isolation:** Each service owns its PostgreSQL schema, no cross-schema foreign keys
|
|
2. **Event-Driven:** Valkey Streams for async communication (scan jobs, notifications)
|
|
3. **Webhook Integration:** Concelier/Excititor → Scheduler for delta events
|
|
4. **Append-Only Data:** AOC for advisories and VEX (immutable, audit-friendly)
|
|
5. **Strong Authentication:** JWT + DPoP for all API calls, OpTok for service-to-service
|
|
6. **Hard Gates:** Signer enforces licensing and scanner authenticity
|
|
7. **Multi-Tenancy:** Tenant ID in all data, tenant-scoped logical replication
|
|
8. **Transparency:** Rekor v2 for public auditability, offline bundles for airgap
|
|
|
|
**Communication Patterns:**
|
|
|
|
| Pattern | Technology | Use Case |
|
|
|---------|------------|----------|
|
|
| Synchronous HTTP | REST APIs | Scanner → Concelier linkset queries |
|
|
| Asynchronous Queue | Valkey Streams | Scanner jobs, Notify delivery |
|
|
| Event Publishing | Valkey Streams | `report.ready`, `drift.detected` |
|
|
| Webhooks | HTTP POST | Concelier/Excititor → Scheduler |
|
|
| Database Replication | PostgreSQL Logical Replication | Policy Engine advisory/VEX data |
|
|
| Object Storage | S3 API (RustFS) | SBOM artifacts, proof bundles |
|
|
|
|
**Security Model:**
|
|
- **Gateway:** Enforces authentication, authorization, rate limiting
|
|
- **Authority:** Issues OpToks with DPoP binding (sender constraint)
|
|
- **Signer:** Hard gates on PoE and scanner authenticity
|
|
- **Tenant Isolation:** All queries filtered by `tenant_id`
|
|
- **Audit Trails:** All privileged actions logged to PostgreSQL
|
|
|
|
This architecture provides **deterministic, reproducible vulnerability scanning** with **strong cryptographic provenance** (DSSE + Rekor), **multi-tenant isolation**, and **VEX-first decisioning** for exploitability analysis.
|
|
|
|
---
|
|
|
|
**For More Information:**
|
|
- [Developer Onboarding](./DEVELOPER_ONBOARDING.md) - Quick start guide
|
|
- [High-Level Architecture](./07_HIGH_LEVEL_ARCHITECTURE.md) - Business-level overview
|
|
- [API/CLI Reference](./09_API_CLI_REFERENCE.md) - Endpoint documentation
|
|
- [Offline Kit](./24_OFFLINE_KIT.md) - Airgap deployment guide
|