From 396e9b75a4dd31bf5e4de0bbd80897ffb64e8059 Mon Sep 17 00:00:00 2001 From: master <> Date: Tue, 23 Dec 2025 11:05:55 +0200 Subject: [PATCH] docs: Add comprehensive component architecture documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created detailed architectural documentation showing component interactions, communication patterns, and data flows across all StellaOps services. ## New Documentation **docs/ARCHITECTURE_DETAILED.md** - Comprehensive architecture guide: - Component topology diagram (all 36+ services) - Infrastructure layer details (PostgreSQL, Valkey, RustFS, NATS) - Service-by-service catalog with responsibilities - Communication patterns with WHY (business purpose) - 5 detailed data flow diagrams: 1. Scan Request Flow (CLI → Scanner → Worker → Policy → Signer → Attestor → Notify) 2. Advisory Update Flow (Concelier → Scheduler → Scanner re-evaluation) 3. VEX Update Flow (Excititor → IssuerDirectory → Scheduler → Policy) 4. Notification Delivery Flow (Scanner → Valkey → Notify → Slack/Teams/Email) 5. Policy Evaluation Flow (Scanner → Policy.Gateway → OPA → PostgreSQL replication) - Database schema isolation details per service - Security boundaries and authentication flows ## Updated Documentation **docs/DEVELOPER_ONBOARDING.md**: - Added link to detailed architecture - Simplified overview with component categories - Quick reference topology tree **docs/07_HIGH_LEVEL_ARCHITECTURE.md**: - Updated infrastructure requirements section - Clarified PostgreSQL as ONLY database - Emphasized Valkey as REQUIRED (not optional) - Marked NATS as optional (Valkey is default transport) **docs/README.md**: - Added link to detailed architecture in navigation ## Key Architectural Insights Documented **Communication Patterns:** - 11 communication steps in scan flow (Gateway → Scanner → Valkey → Worker → Concelier → Policy → Signer → Attestor → Valkey → Notify → Slack) - PostgreSQL logical replication (advisory_raw_stream, vex_raw_stream → Policy Engine) - Valkey Streams for async job queuing (XADD/XREADGROUP pattern) - HTTP webhooks for delta events (Concelier/Excititor → Scheduler) **Security Boundaries:** - Authority issues OpToks with DPoP binding (RFC 9449) - Signer enforces PoE validation + scanner digest verification - All services validate JWT + DPoP on every request - Tenant isolation via tenant_id in all PostgreSQL queries **Database Patterns:** - 8 dedicated PostgreSQL schemas (authority, scanner, vuln, vex, scheduler, notify, policy, orchestrator) - Append-only advisory/VEX storage (AOC - Aggregation-Only Contract) - BOM-Index for impact selection (CVE → PURL → image mapping) This documentation provides complete visibility into who calls who, why they communicate, what data flows through the system, and how security is enforced at every layer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 --- docs/07_HIGH_LEVEL_ARCHITECTURE.md | 30 +- docs/ARCHITECTURE_DETAILED.md | 1518 +++++++++++++++++ docs/DEVELOPER_ONBOARDING.md | 57 +- ...T_4300_0002_0001_unknowns_budget_policy.md | 0 ...02_0002_unknowns_attestation_predicates.md | 0 ...300_0003_0001_sealed_knowledge_snapshot.md | 0 6 files changed, 1596 insertions(+), 9 deletions(-) create mode 100644 docs/ARCHITECTURE_DETAILED.md rename docs/implplan/{ => archived}/SPRINT_4300_0002_0001_unknowns_budget_policy.md (100%) rename docs/implplan/{ => archived}/SPRINT_4300_0002_0002_unknowns_attestation_predicates.md (100%) rename docs/implplan/{ => archived}/SPRINT_4300_0003_0001_sealed_knowledge_snapshot.md (100%) diff --git a/docs/07_HIGH_LEVEL_ARCHITECTURE.md b/docs/07_HIGH_LEVEL_ARCHITECTURE.md index e6ceb3175..c49213601 100755 --- a/docs/07_HIGH_LEVEL_ARCHITECTURE.md +++ b/docs/07_HIGH_LEVEL_ARCHITECTURE.md @@ -50,15 +50,29 @@ | **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, vulnerability triage (artifact-first), audit bundles, **Scheduler**, **Notify**, runtime, reports. | Stateless. | | **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper; **schedule** and **notify** verbs. | Local/CI. | -### 1.2 Third‑party (self‑hosted) +### 1.2 Infrastructure Requirements -* **Fulcio** (Sigstore CA) — issues short‑lived signing certs (keyless). -* **Rekor v2** (tile‑backed transparency log). -* **RustFS** — offline-first object store with deterministic REST API; S3/MinIO compatibility layer available for legacy deployments. -* **PostgreSQL** (≥16) — primary control-plane storage with per-module schema isolation (authority, vuln, vex, scheduler, notify, policy, concelier). See [Database Architecture](#database-architecture-postgresql). -* **Valkey** (≥8.0) — Redis-compatible cache for DPoP nonces, event streams, queues, and rate limiting. -* **Queue** — Valkey Streams (default); NATS JetStream available as optional transport (opt-in only). -* **OCI Registry** — must support **Referrers API** (discover SBOMs/signatures). +**REQUIRED Infrastructure:** + +* **PostgreSQL** (≥16) — **ONLY** database for all persistent data. Per-module schema isolation (authority, vuln, vex, scanner, scheduler, notify, policy, orchestrator). See [Database Architecture](#database-architecture-postgresql). +* **Valkey** (≥8.0) — Redis-compatible cache, DPoP nonces (RFC 9449), event streams, job queues, rate limiting. **REQUIRED** for platform operation. +* **RustFS** — S3-compatible object storage for SBOM artifacts, proof bundles, and scan evidence. HTTP API with deterministic responses. + +**OPTIONAL Infrastructure:** + +* **NATS JetStream** — Alternative messaging transport (Valkey Streams is default). Opt-in only via configuration. + +**External Dependencies:** + +* **OCI Registry** — Must support **Referrers API** (discover SBOMs/signatures). +* **Fulcio** (Sigstore CA) — Issues short-lived signing certs (keyless signing). Optional if using KMS keys. +* **Rekor v2** — Tile-backed transparency log. Optional if `OFFLINEKIT_ENABLED=true` (airgap mode). + +**Architecture Note:** +- PostgreSQL is the ONLY database (MongoDB fully removed as of 2025-12-23) +- Valkey replaces Redis (drop-in compatible, but required) +- RustFS is primary object storage (MinIO removed) +- NATS is OPTIONAL, not required (Valkey Streams handle queuing) ### 1.3 Cloud licensing (Stella Ops) diff --git a/docs/ARCHITECTURE_DETAILED.md b/docs/ARCHITECTURE_DETAILED.md new file mode 100644 index 000000000..4b5eb6372 --- /dev/null +++ b/docs/ARCHITECTURE_DETAILED.md @@ -0,0 +1,1518 @@ +# StellaOps Platform - Detailed Architecture + +**Last Updated:** 2025-12-23 +**Purpose:** Comprehensive component architecture with communication patterns and data flows + +## Table of Contents + +1. [Component Topology](#component-topology) +2. [Infrastructure Layer](#infrastructure-layer) +3. [Service Catalog](#service-catalog) +4. [Communication Patterns](#communication-patterns) +5. [Data Flow Diagrams](#data-flow-diagrams) +6. [Database Schema Isolation](#database-schema-isolation) +7. [Security Boundaries](#security-boundaries) + +--- + +## Component Topology + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ CLIENT LAYER │ +│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ +│ │ stella │ │ Web UI │ │ CI/CD │ │ Zastava │ │ +│ │ CLI │ │ Angular │ │ Pipeline │ │ Observer │ │ +│ └─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘ │ +│ │ │ │ │ │ +└────────┼─────────────┼─────────────┼─────────────┼──────────────────────────┘ + │ │ │ │ + └─────────────┴─────────────┴─────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ GATEWAY LAYER │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ Gateway.WebService │ │ +│ │ • JWT validation • Rate limiting │ │ +│ │ • DPoP verification • Request routing │ │ +│ │ • Tenant resolution • Correlation tracking │ │ +│ └───┬────────────────────────────────────────────────┬───────────┘ │ +│ │ │ │ +└──────┼────────────────────────────────────────────────┼─────────────────────┘ + │ │ + ▼ ▼ +┌─────────────────┐ ┌─────────────────┐ +│ AUTHORITY │◄───────────────────────────│ ALL SERVICES │ +│ │ OpTok validation │ (Resource │ +│ • OAuth2/OIDC │ DPoP nonce verification │ servers) │ +│ • DPoP binding │ │ │ +│ • OpTok issue │ └─────────────────┘ +│ • mTLS verify │ +└────────┬────────┘ + │ stores tokens, + │ audit trails + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ CORE SERVICES LAYER │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ SCANNING ENGINE │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Scanner.WebService │────────▶│ Scanner.Worker │ │ │ +│ │ │ │ Valkey │ │ │ │ +│ │ │ • Scan orchestrate │ queue │ • Layer analysis │ │ │ +│ │ │ • Report catalog │ │ • SBOM generation │ │ │ +│ │ │ • Policy eval │ │ • Reachability │ │ │ +│ │ └─────┬──────────────┘ └────────┬───────────┘ │ │ +│ │ │ │ │ │ +│ │ │ linkset │ artifact │ │ +│ │ │ query │ upload │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Concelier │ │ RustFS │ │ │ +│ │ │ WebService │ │ (S3 API) │ │ │ +│ │ └──────────────┘ └──────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ ADVISORY INGESTION ENGINE │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Concelier.WebService│──────▶│ Concelier.Worker │ │ │ +│ │ │ │ Jobs │ │ │ │ +│ │ │ • Ingest advisories│ │ • Connector fetch │ │ │ +│ │ │ • Compute linksets │ │ • Normalize data │ │ │ +│ │ │ • AOC enforcement │ │ • Delta detection │ │ │ +│ │ └─────┬──────────────┘ └────────────────────┘ │ │ +│ │ │ │ │ +│ │ │ webhook: advisory delta events │ │ +│ │ ▼ │ │ +│ │ ┌──────────────┐ │ │ +│ │ │ Scheduler │ │ │ +│ │ │ WebService │ │ │ +│ │ └──────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ VEX INGESTION ENGINE │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Excititor.WebService│──────▶│ Excititor.Worker │ │ │ +│ │ │ │ Jobs │ │ │ │ +│ │ │ • Ingest VEX │ │ • Fetch VEX feeds │ │ │ +│ │ │ • DSSE verify │ │ • Trust verify │ │ │ +│ │ │ • Consensus calc │ │ • Signature check │ │ │ +│ │ └─────┬──────────────┘ └──────┬─────────────┘ │ │ +│ │ │ │ │ │ +│ │ │ webhook: VEX delta │ trust lookup │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ Scheduler │ │ Issuer │ │ │ +│ │ │ WebService │ │ Directory │ │ │ +│ │ └──────────────┘ └──────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ ORCHESTRATION & SCHEDULING │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Scheduler.WebService│──────▶│ Scheduler.Worker │ │ │ +│ │ │ │ Jobs │ │ │ │ +│ │ │ • Impact select │ │ • Re-scan trigger │ │ │ +│ │ │ • Rate limit │ │ • Batch enforce │ │ │ +│ │ │ • Maintenance win │ │ • Progress track │ │ │ +│ │ └─────┬──────────────┘ └──────┬─────────────┘ │ │ +│ │ │ │ │ │ +│ │ │ │ HTTP: enqueue scan │ │ +│ │ │ ▼ │ │ +│ │ │ ┌──────────────┐ │ │ +│ │ │ │ Scanner.Web │ │ │ +│ │ │ └──────────────┘ │ │ +│ │ │ │ │ +│ │ ┌─────▼──────────────┐ │ │ +│ │ │ Orchestrator.Web │ │ │ +│ │ │ │ │ │ +│ │ │ • DAG workflows │ │ │ +│ │ │ • Pack runs │ │ │ +│ │ │ • Job streaming │ │ │ +│ │ └────────────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ NOTIFICATION ENGINE │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Notify.WebService │────────▶│ Notify.Worker │ │ │ +│ │ │ │ Valkey │ │ │ │ +│ │ │ • Channel mgmt │ Streams │ • Slack delivery │ │ │ +│ │ │ • Template engine │ XADD/ │ • Teams delivery │ │ │ +│ │ │ • Throttle/digest │ XREAD │ • Email delivery │ │ │ +│ │ └─────▲──────────────┘ └──────┬─────────────┘ │ │ +│ │ │ │ │ │ +│ │ │ report.ready events │ External HTTP/SMTP │ │ +│ │ │ ▼ │ │ +│ │ ┌─────┴──────────────┐ ┌──────────────┐ │ │ +│ │ │ Scanner.Web │ │ Slack API │ │ │ +│ │ │ (events) │ │ Teams API │ │ │ +│ │ └────────────────────┘ │ SMTP │ │ │ +│ │ └──────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ CRYPTOGRAPHIC SERVICES │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Signer.WebService │────────▶│ Attestor.WebService│ │ │ +│ │ │ │ mTLS │ │ │ │ +│ │ │ • DSSE signing │ OpTok │ • Rekor v2 submit │ │ │ +│ │ │ • PoE validation │ │ • Receipt verify │ │ │ +│ │ │ • Multi-profile │ │ • Offline bundles │ │ │ +│ │ │ FIPS/GOST/SM │ │ │ │ │ +│ │ └─────┬──────────────┘ └──────┬─────────────┘ │ │ +│ │ │ │ │ │ +│ │ │ KMS/PKCS11 │ External │ │ +│ │ ▼ ▼ │ │ +│ │ ┌──────────────┐ ┌──────────────┐ │ │ +│ │ │ External KMS │ │ Rekor v2 │ │ │ +│ │ │ (AWS/GCP) │ │ (Sigstore) │ │ │ +│ │ └──────────────┘ └──────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────┐ │ +│ │ POLICY ENGINE │ │ +│ │ │ │ +│ │ ┌────────────────────┐ ┌────────────────────┐ │ │ +│ │ │ Policy.Gateway │────────▶│ Policy Engine │ │ │ +│ │ │ │ HTTP │ (OPA/Rego) │ │ │ +│ │ │ • Exception mgmt │ │ │ │ │ +│ │ │ • Approval flow │ │ • Rule eval │ │ │ +│ │ │ • Delta compute │ │ • Verdict compute │ │ │ +│ │ └─────▲──────────────┘ └──────▲─────────────┘ │ │ +│ │ │ │ │ │ +│ │ │ policy eval request │ PostgreSQL │ │ +│ │ │ │ logical replication │ │ +│ │ ┌─────┴──────────────┐ │ │ │ +│ │ │ Scanner.Web │ ┌─────┴──────────┐ │ │ +│ │ │ (verdict request) │ │ advisory_raw │ │ │ +│ │ └────────────────────┘ │ vex_raw │ │ │ +│ │ │ (streams) │ │ │ +│ │ └────────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────────────────────┐ +│ INFRASTRUCTURE LAYER │ +│ │ +│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ +│ │ PostgreSQL │ │ Valkey │ │ RustFS │ │ +│ │ v16+ │ │ v8.0 │ │ (S3-compatible) │ │ +│ │ │ │ │ │ │ │ +│ │ • Per-service │ │ • DPoP nonces │ │ • SBOM artifacts │ │ +│ │ schemas │ │ • Event streams │ │ • Proof bundles │ │ +│ │ • Logical │ │ • Job queues │ │ • CAS storage │ │ +│ │ replication │ │ • Cache │ │ │ │ +│ │ • REQUIRED │ │ • REQUIRED │ │ • REQUIRED │ │ +│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ +│ │ +│ ┌──────────────────┐ │ +│ │ NATS │ │ +│ │ JetStream │ │ +│ │ │ │ +│ │ • Message queue │ │ +│ │ • Optional │ │ +│ │ (Valkey is │ │ +│ │ default) │ │ +│ └──────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Infrastructure Layer + +### PostgreSQL v16+ (REQUIRED) + +**Purpose:** Primary database for ALL persistent data + +**Schema Isolation:** + +| Schema | Owner Service | Purpose | +|--------|---------------|---------| +| `authority` | Authority | Users, clients, tenants, keys, audit trails | +| `scanner` | Scanner | Scan manifests, triage, EPSS, reachability graphs | +| `vuln` | Concelier | Advisory raw documents, linksets, observations | +| `vex` | Excititor | VEX raw documents, consensus, provider state | +| `scheduler` | Scheduler | Graph jobs, runs, schedules, impact snapshots | +| `notify` | Notify | Channels, templates, delivery history, digests | +| `policy` | Policy.Gateway | Exception objects, snapshots, unknowns | +| `orchestrator` | Orchestrator | Sources, runs, jobs, DAGs, pack runs | + +**Special Features:** +- **Logical Replication:** `advisory_raw_stream`, `vex_raw_stream` → Policy Engine +- **Per-tenant isolation:** Tenant ID in all tables for row-level security +- **Append-only patterns:** AOC (Aggregation-Only Contract) for advisory/VEX immutability + +### Valkey v8.0 (REQUIRED) + +**Purpose:** Cache, DPoP security, event streams, job queues + +**Use Cases:** + +| Pattern | Services | Purpose | +|---------|----------|---------| +| DPoP nonces | Authority | RFC 9449 nonce storage (30s TTL) | +| Event streams | Scanner, Notify, Scheduler | XADD for `report.ready`, drift events | +| Job queues | Scanner, Notify | XREADGROUP for worker coordination | +| Cache | All services | Distributed caching with tenant prefixes | +| Rate limiting | Gateway, Authority | Token bucket counters | + +**Default Transport:** Valkey Streams preferred over NATS for queuing + +### RustFS (REQUIRED) + +**Purpose:** S3-compatible object storage for artifacts + +**Buckets:** + +| Bucket | Services | Content | +|--------|----------|---------| +| `scanner-artifacts` | Scanner | Layer SBOMs, composed SBOMs, proof bundles | +| `surface-cache` | Scanner.Worker | Extracted filesystem surfaces | +| `evidence-locker` | Evidence Locker | Immutable audit evidence | +| `cas-replay` | Replay Engine | Content-addressed snapshots | + +**API:** HTTP/S3 with optional API key authentication + +### NATS JetStream (OPTIONAL) + +**Purpose:** Alternative messaging transport (not default) + +**When to Use:** +- High-throughput environments requiring persistent streams +- Multi-datacenter replication scenarios +- When Valkey Streams insufficient for scale + +**Default:** Valkey is preferred; NATS opt-in via configuration + +--- + +## Service Catalog + +### Gateway Layer + +#### Gateway.WebService + +**Port:** 8080 (HTTP), 8443 (HTTPS) +**Dependencies:** Authority (JWT validation), Backend services (routing) + +**Responsibilities:** +- **Authentication:** JWT + DPoP verification on all requests +- **Authorization:** Scope-based access control (RBAC claims) +- **Tenant Resolution:** Multi-tenant routing via `X-Tenant-Id` header or JWT +- **Rate Limiting:** Per-client token bucket (Valkey-backed) +- **Request Routing:** Routes to Scanner, Concelier, Policy, Scheduler, Notify +- **Correlation Tracking:** Injects `X-Correlation-Id` for distributed tracing + +**Security Boundaries:** +- TLS termination (mutual TLS optional) +- DPoP sender constraint validation +- OpTok refresh on expiry + +--- + +### Authentication & Security + +#### Authority + +**Port:** 8440 (HTTPS) +**Database:** `authority` schema +**Dependencies:** Valkey (DPoP nonces), External LDAP/OIDC (plugins) + +**Responsibilities:** +- **OAuth 2.1 Server:** Issues OpToks (operational tokens) with DPoP binding +- **Client Credentials Flow:** Machine-to-machine authentication +- **Resource Owner Password Flow:** User authentication with LDAP/OIDC +- **DPoP (RFC 9449):** Sender-constrained tokens with nonce validation +- **mTLS:** Certificate-based client authentication +- **Audit Trails:** All authentication events logged to PostgreSQL +- **Multi-Tenancy:** Tenant-scoped token issuance + +**Token Types:** +- **OpTok:** Short-lived (15 min), DPoP-bound, scoped access token +- **Refresh Token:** Rotation-protected, 7-day expiry +- **ID Token:** OIDC identity claims + +**Security:** +- DPoP nonces stored in Valkey with 30s TTL +- OpTok signatures verified by all resource servers +- Rate limiting on failed login attempts + +#### Signer.WebService + +**Port:** 8441 (HTTPS with mTLS) +**Dependencies:** Authority (PoE validation), External KMS (optional), OCI Registry (scanner digest verification) + +**Responsibilities:** +- **DSSE Signing:** Signs in-toto envelopes for SBOMs, VEX, attestations +- **PoE Validation:** Validates Proof-of-Entitlement (license check) +- **Multi-Profile Keys:** FIPS, GOST (CryptoPro), SM (Chinese national crypto) +- **Scanner Authenticity:** Verifies scanner image digest is Stella Ops-signed +- **Key Management:** HSM/KMS integration (AWS, GCP, PKCS11) + +**Hard Gates (Reject on Failure):** +1. OpTok validation (DPoP + mTLS) +2. PoE license check +3. Scanner image digest verification (cosign signature) + +**Key Profiles:** + +| Profile | Algorithm | Use Case | +|---------|-----------|----------| +| `default` | ECDSA P-256 | Standard signing | +| `fips` | ECDSA P-384 | FIPS 140-2 compliance | +| `gost` | GOST R 34.10-2012 | Russian regulations | +| `sm` | SM2 | Chinese regulations | + +#### Attestor.WebService + +**Port:** 8442 (HTTPS) +**Dependencies:** Signer (DSSE signing), Rekor v2 (transparency log) + +**Responsibilities:** +- **Rekor Submission:** Posts DSSE bundles to Sigstore Rekor v2 +- **Receipt Retrieval:** Fetches inclusion proofs from Rekor +- **Offline Bundles:** Generates offline verification bundles +- **Verification:** Validates Rekor receipts for CLI/CI + +**Workflow:** +1. Receive DSSE envelope from Scanner/Excititor +2. Call Signer for signature (mTLS) +3. Submit signed DSSE to Rekor v2 +4. Retrieve inclusion proof +5. Return receipt to caller + +**Offline Mode:** +- Optional: Can operate without Rekor if `OFFLINEKIT_ENABLED=true` +- Uses local timestamp service for non-repudiation + +--- + +### Scanning Engine + +#### Scanner.WebService + +**Port:** 8444 (HTTP) +**Database:** `scanner` schema +**Object Storage:** RustFS `scanner-artifacts` bucket +**Dependencies:** Authority (auth), Concelier (linkset queries), Policy (evaluation), Signer (DSSE), Attestor (Rekor) + +**Responsibilities:** +- **Scan Orchestration:** Enqueues scan jobs to Scanner.Worker via Valkey +- **Report Catalog:** Maintains scan history, triage data, policy verdicts +- **Linkset Enrichment:** Queries Concelier for advisory linksets by PURL/CPE +- **Policy Evaluation:** Calls Policy.Gateway for verdict computation +- **SBOM Export:** Generates SPDX 3.0.1 and CycloneDX 1.6 SBOMs +- **VEX Export:** Calls Excititor for VEX statement generation +- **Proof Bundles:** Assembles DSSE envelopes with signatures + Rekor receipts +- **Event Publishing:** Emits `report.ready` events to Notify via Valkey Streams + +**API Endpoints:** + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `/v1/scans` | POST | Enqueue scan job | +| `/v1/scans/{id}` | GET | Retrieve scan report | +| `/v1/scans/{id}/sbom` | GET | Download SBOM (SPDX/CycloneDX) | +| `/v1/scans/{id}/vex` | GET | Download VEX document | +| `/v1/scans/{id}/proof` | GET | Download proof bundle (DSSE + Rekor receipt) | +| `/v1/triage` | POST | Mark finding as false positive | + +**Queue Pattern:** +- Publishes to Valkey Stream: `scanner:jobs` +- Scanner.Worker consumes via `XREADGROUP` + +#### Scanner.Worker + +**Database:** `scanner` schema (read EPSS, write inventory) +**Object Storage:** RustFS `scanner-artifacts`, `surface-cache` +**Dependencies:** Scanner.WebService (internal API), RustFS (upload) + +**Responsibilities:** +- **Image Pull:** OCI image download and layer extraction +- **Layer Analysis:** Runs OS/language/native analyzers per layer +- **SBOM Generation:** Per-layer SBOMs in SPDX 3.0.1 format +- **Composition:** Merges layer SBOMs into final composed SBOM +- **Reachability Analysis:** Call-graph extraction for Java/Node/Go/Python +- **Artifact Upload:** Uploads SBOMs to RustFS +- **Progress Reporting:** Heartbeat to Scanner.WebService every 10s + +**Analyzers:** + +| Analyzer | Ecosystem | Method | +|----------|-----------|--------| +| `distro-debian` | Debian/Ubuntu | `dpkg-query`, `apt-cache` | +| `distro-rpm` | RHEL/Fedora/CentOS | RPM database | +| `distro-alpine` | Alpine Linux | APK database | +| `lang-java` | Java/Maven/Gradle | JAR manifests, `pom.xml`, `build.gradle` | +| `lang-node` | Node.js/npm | `package.json`, `package-lock.json` | +| `lang-python` | Python/pip | `requirements.txt`, `Pipfile`, wheel metadata | +| `lang-go` | Golang | `go.mod`, binary parsing | +| `native` | C/C++ | ELF symbol tables, version symbols | + +**Reachability:** +- Builds call graph (CG) with nodes/edges in PostgreSQL `cg_node`, `cg_edge` +- Determines if vulnerable functions are callable from entrypoints +- Flags findings as `REACHABLE`/`UNREACHABLE`/`UNKNOWN` + +--- + +### Advisory Ingestion + +#### Concelier.WebService + +**Port:** 8445 (HTTP) +**Database:** `vuln` schema (`advisory_raw`, `linksets`) +**Dependencies:** Scheduler (webhook for delta events), Upstream sources (connectors) + +**Responsibilities:** +- **Advisory Ingestion:** Fetches vulnerabilities from NVD, Red Hat, Debian, Ubuntu, GitHub, etc. +- **Normalization:** Converts vendor formats to canonical Concelier advisory JSON +- **Linkset Computation:** Maps CVE IDs to PURLs/CPEs with version ranges +- **AOC Enforcement:** Append-only writes to `advisory_raw` (immutable after insert) +- **Delta Detection:** Detects new advisories and emits webhook to Scheduler +- **Merge Engine:** Deduplicates advisories across sources with priority rules + +**Connectors:** + +| Connector | Source | Update Frequency | +|-----------|--------|------------------| +| `nvd` | NVD CVE JSON | Hourly | +| `redhat` | Red Hat OVAL | Every 6 hours | +| `debian` | Debian Security Tracker | Every 6 hours | +| `ubuntu` | Ubuntu CVE Tracker | Every 6 hours | +| `github` | GitHub Advisory Database | Hourly | +| `alpine` | Alpine SecDB | Every 6 hours | +| `osv` | OSV.dev | Hourly | + +**Linkset API:** +- `/v1/lnm/linksets/{advisoryId}` - Returns PURL/CPE mappings for a CVE +- Consumed by Scanner for enrichment + +**PostgreSQL Logical Replication:** +- `advisory_raw_stream` → Policy Engine (tenant-scoped replication) + +#### Concelier.Worker + +**Dependencies:** Concelier.WebService (internal API), Upstream advisory sources + +**Responsibilities:** +- **Scheduled Fetching:** Polls connectors on cron schedules +- **Delta Computation:** Compares fetched data with last snapshot +- **Advisory Normalization:** Parses OVAL, JSON, XML into canonical format +- **Database Insert:** Writes to `advisory_raw` via Concelier.WebService API + +--- + +### VEX Ingestion + +#### Excititor.WebService + +**Port:** 8446 (HTTP) +**Database:** `vex` schema (`vex_raw`, `consensus`) +**Dependencies:** IssuerDirectory (trust verification), Scheduler (webhook for delta events) + +**Responsibilities:** +- **VEX Ingestion:** Fetches OpenVEX and CSAF VEX documents from vendors +- **DSSE Verification:** Validates in-toto signatures on VEX statements +- **Trust Scoring:** Applies trust weights to issuers from IssuerDirectory +- **Consensus Computation:** Resolves conflicts when multiple VEX statements conflict +- **AOC Enforcement:** Append-only writes to `vex_raw` (immutable after insert) +- **Delta Detection:** Detects new VEX statements and emits webhook to Scheduler + +**VEX Sources:** + +| Source | Format | Signature | +|--------|--------|-----------| +| Red Hat VEX | CSAF VEX | PGP-signed | +| CISA VEX | OpenVEX | DSSE in-toto | +| Vendor VEX | OpenVEX | DSSE in-toto | + +**Consensus Algorithm:** +- Weighted voting based on issuer trust scores +- Tie-breaking: Most conservative status wins (e.g., `affected` > `not_affected`) +- Result stored in `consensus` table with provenance + +**PostgreSQL Logical Replication:** +- `vex_raw_stream` → Policy Engine (tenant-scoped replication) + +#### Excititor.Worker + +**Dependencies:** Excititor.WebService (internal API), IssuerDirectory (trust lookup) + +**Responsibilities:** +- **Scheduled Fetching:** Polls VEX sources on cron schedules +- **Signature Verification:** Validates DSSE envelopes via IssuerDirectory +- **Trust Verification:** Checks issuer is in trusted list +- **Database Insert:** Writes to `vex_raw` via Excititor.WebService API + +--- + +### Policy Engine + +#### Policy.Gateway + +**Port:** 8447 (HTTP) +**Database:** `policy` schema (`exception_objects`, `snapshots`, `unknowns`) +**Dependencies:** Policy Engine (OPA/Rego), Authority (auth) + +**Responsibilities:** +- **Policy Evaluation Gateway:** Proxies requests to OPA/Rego engine +- **Exception Management:** Stores approved false positives, waivers +- **Approval Workflows:** Multi-stage approval for policy exceptions +- **Delta Computation:** Compares baseline vs. current scan for policy drift +- **Unknowns Tracking:** Records unresolved CVEs (no fix available) + +**API Endpoints:** + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `/v1/policy/evaluate` | POST | Evaluate policy against scan results | +| `/v1/policy/exceptions` | POST | Create exception request | +| `/v1/policy/exceptions/{id}/approve` | POST | Approve exception | +| `/v1/policy/unknowns` | GET | List unresolved findings | + +**Policy Data Sources:** +- PostgreSQL logical replication from `advisory_raw_stream`, `vex_raw_stream` +- Real-time advisory and VEX data for policy eval + +#### Policy Engine (OPA/Rego) + +**Container:** Separate OPA container, called via HTTP by Policy.Gateway +**Language:** Rego policies +**Data Sources:** PostgreSQL logical replication streams + +**Policies:** +- `unknowns-budget.rego` - Limits unresolved CVEs (no fix available) +- `severity-gates.rego` - Blocks based on CVSS severity +- `reachability-gates.rego` - Allows unreachable findings +- `vex-override.rego` - Applies VEX `not_affected` status + +--- + +### Orchestration & Scheduling + +#### Scheduler.WebService + +**Port:** 8448 (HTTP) +**Database:** `scheduler` schema (`graph_jobs`, `runs`, `schedules`, `impact_snapshots`) +**Dependencies:** Scanner (re-scan requests), Cartographer (export notifications, optional) + +**Responsibilities:** +- **Impact Selection:** When advisories/VEX change, identifies affected images via BOM-Index +- **Re-scan Orchestration:** Enqueues re-scan jobs to Scanner.WebService +- **Rate Limiting:** Enforces max concurrent scans, maintenance windows +- **Schedule Management:** Manages periodic scan schedules (cron) +- **Webhook Ingestion:** Receives delta events from Concelier, Excititor + +**Webhook Endpoints:** + +| Endpoint | Source | Payload | +|----------|--------|---------| +| `/webhooks/concelier` | Concelier | Advisory delta event | +| `/webhooks/excititor` | Excititor | VEX delta event | + +**Impact Selection Algorithm:** +1. Receive advisory delta (CVE IDs added) +2. Query BOM-Index for images containing affected PURLs +3. Batch impacted images (max 100 per run) +4. Enforce rate limits and maintenance windows +5. Enqueue re-scans to Scanner.WebService + +#### Scheduler.Worker + +**Dependencies:** Scheduler.WebService (internal API), Scanner.WebService (HTTP) + +**Responsibilities:** +- **Job Execution:** Claims jobs from Scheduler.WebService +- **Batch Processing:** Processes impacted image batches +- **Re-scan Trigger:** HTTP POST to Scanner `/v1/scans` with `rescan=true` +- **Progress Reporting:** Heartbeat to Scheduler every 10s + +#### Orchestrator.WebService + +**Port:** 8449 (HTTP) +**Database:** `orchestrator` schema (`sources`, `runs`, `jobs`, `dags`, `pack_runs`) + +**Responsibilities:** +- **DAG Workflows:** Manages directed acyclic graph job dependencies +- **Pack Runs:** Bundles multiple jobs into atomic runs +- **Job Streaming:** WebSocket endpoints for real-time job status +- **Worker Coordination:** Job claim, heartbeat, completion tracking + +**Use Cases:** +- Complex multi-step workflows (e.g., scan → policy → VEX → attest) +- Batch operations (e.g., scan all images in namespace) + +--- + +### Notification Engine + +#### Notify.WebService + +**Port:** 8450 (HTTP) +**Database:** `notify` schema (`channels`, `templates`, `delivery_history`, `digest_state`) +**Dependencies:** Valkey (delivery queue), Scanner (event subscription) + +**Responsibilities:** +- **Channel Management:** Configures Slack, Teams, Email, Webhook channels +- **Template Engine:** Renders notification templates with Liquid syntax +- **Throttling:** Rate limits notifications (max N per hour per channel) +- **Digest Mode:** Batches notifications into hourly/daily digests +- **Event Subscription:** Subscribes to `report.ready` events from Scanner + +**Channel Types:** + +| Channel | Protocol | Configuration | +|---------|----------|---------------| +| Slack | HTTP (Slack API) | Bot token, channel ID | +| Teams | HTTP (webhook) | Webhook URL | +| Email | SMTP | SMTP server, credentials | +| Webhook | HTTP | URL, auth headers | + +**Delivery Queue:** +- Publishes to Valkey Stream: `notify:delivery` +- Notify.Worker consumes via `XREADGROUP` + +#### Notify.Worker + +**Dependencies:** Notify.WebService (internal API), External services (Slack/Teams/SMTP) + +**Responsibilities:** +- **Job Claim:** Claims delivery jobs from Valkey queue +- **Template Rendering:** Renders Liquid templates with event data +- **Delivery Execution:** HTTP/SMTP delivery with retries (exponential backoff) +- **Idempotency:** Tracks delivery IDs to prevent duplicates +- **SLO Tracking:** Records delivery latency for P95 monitoring + +**Retry Policy:** +- Max 3 retries +- Backoff: 1s, 5s, 15s +- Dead-letter queue after exhaustion + +--- + +### Cryptographic Services + +#### Signer.WebService + +**Port:** 8441 (HTTPS with mTLS) +**Dependencies:** Authority (PoE validation), External KMS (AWS/GCP/PKCS11), OCI Registry (digest verification) + +**Responsibilities:** +- **DSSE Signing:** Signs in-toto envelopes (SBOMs, VEX, attestations) +- **PoE Validation:** License check via Authority introspection +- **Scanner Authenticity:** Verifies scanner image digest is Stella Ops-signed +- **Multi-Profile Keys:** FIPS, GOST, SM for regulatory compliance +- **Key Rotation:** Automated key rotation with overlap period + +**Hard Gates:** +1. OpTok validation (DPoP + mTLS) +2. PoE license check (fails if expired) +3. Scanner image digest verification (must be cosign-signed by Stella Ops) + +**Key Storage:** + +| Storage | Use Case | +|---------|----------| +| In-memory | Development | +| PKCS11 HSM | On-prem production | +| AWS KMS | AWS cloud deployments | +| GCP KMS | GCP cloud deployments | + +#### Attestor.WebService + +**Port:** 8442 (HTTPS) +**Dependencies:** Signer (DSSE signing), Rekor v2 (transparency log) + +**Responsibilities:** +- **Rekor Submission:** Posts DSSE bundles to Sigstore Rekor v2 +- **Receipt Retrieval:** Fetches inclusion proofs (Merkle tree path) +- **Offline Bundles:** Packages DSSE + Rekor receipt for airgap verification +- **Verification API:** Validates Rekor receipts for CLI/CI + +**Workflow:** +1. Receive DSSE envelope from Scanner/Excititor +2. Call Signer for signature (mTLS with OpTok) +3. Submit signed DSSE to Rekor v2 (`/api/v2/entries`) +4. Retrieve inclusion proof from Rekor +5. Return proof bundle to caller + +**Offline Mode:** +- When `OFFLINEKIT_ENABLED=true`: + - Uses local timestamp service (no Rekor) + - Bundles DSSE + TSA timestamp + trust anchors + - Suitable for airgap deployments + +--- + +### Supporting Services + +#### IssuerDirectory.WebService + +**Port:** 8451 (HTTP) +**Database:** None (read-only configuration) + +**Responsibilities:** +- **Trusted Issuer Registry:** Maintains list of authorized VEX/SBOM signers +- **Trust Weights:** Assigns numerical trust scores (0.0 - 1.0) to issuers +- **Seed Data:** CSAF trusted providers from official lists + +**Issuer Manifest:** +```json +{ + "issuers": [ + { + "id": "redhat", + "name": "Red Hat Product Security", + "publicKey": "-----BEGIN PUBLIC KEY-----...", + "trustWeight": 0.95 + }, + { + "id": "cisa", + "name": "CISA Cybersecurity", + "publicKey": "-----BEGIN PUBLIC KEY-----...", + "trustWeight": 1.0 + } + ] +} +``` + +**API:** +- `/v1/issuers` - List all trusted issuers +- `/v1/issuers/{id}` - Get issuer details + +--- + +## Communication Patterns + +### 1. Scan Request Flow + +``` +CLI/UI + │ + │ POST /v1/scans + │ { "imageRef": "alpine:latest" } + ▼ +Gateway.WebService + │ + │ 1. Validate JWT + DPoP + │ 2. Check rate limits + │ 3. Route to Scanner + ▼ +Scanner.WebService + │ + │ 1. Create scan record in PostgreSQL + │ 2. XADD to Valkey: scanner:jobs + │ + │ ◄──────────────────────────┐ + │ │ + ▼ │ +Valkey Stream │ + scanner:jobs │ + │ │ + │ XREADGROUP (consumer group) │ + ▼ │ +Scanner.Worker │ + │ │ + │ 1. Pull OCI image │ + │ 2. Extract layers │ + │ 3. Run analyzers │ + │ 4. Generate SBOMs │ + │ 5. Upload to RustFS │ + │ 6. Query Concelier for linksets + │ └─► HTTP GET /v1/lnm/linksets/{cveId} + │ │ + │ 7. Heartbeat ──────────────┘ + │ POST /internal/jobs/{id}/heartbeat + │ + │ 8. Complete + │ POST /internal/jobs/{id}/complete + ▼ +Scanner.WebService + │ + │ 1. Update scan record (status=completed) + │ 2. Call Policy.Gateway for verdict + │ POST /v1/policy/evaluate + │ └─► Policy.Gateway + │ └─► Policy Engine (OPA) + │ └─► PostgreSQL (advisory_raw_stream) + │ + │ 3. Call Signer for DSSE signature + │ POST /v1/sign (mTLS + OpTok) + │ └─► Signer.WebService + │ ├─► Validate PoE (license) + │ ├─► Verify scanner digest (cosign) + │ └─► Sign DSSE envelope + │ + │ 4. Call Attestor for Rekor submission + │ POST /v1/attest + │ └─► Attestor.WebService + │ ├─► Submit to Rekor v2 + │ └─► Retrieve inclusion proof + │ + │ 5. Store proof bundle in RustFS + │ 6. XADD to Valkey: events:report.ready + │ + ▼ +Valkey Stream + events:report.ready + │ + │ XREADGROUP + ▼ +Notify.WebService + │ + │ 1. Render template + │ 2. XADD to Valkey: notify:delivery + │ + ▼ +Valkey Stream + notify:delivery + │ + │ XREADGROUP + ▼ +Notify.Worker + │ + │ 1. Claim delivery job + │ 2. HTTP POST to Slack API + │ 3. Mark complete + ▼ +Slack Channel +``` + +**Communication Summary:** +1. **Gateway → Scanner:** HTTP POST (JWT + DPoP auth) +2. **Scanner → Valkey:** XADD (queue job) +3. **Worker → Valkey:** XREADGROUP (consume job) +4. **Worker → Scanner:** HTTP POST (heartbeat, completion) +5. **Worker → Concelier:** HTTP GET (linkset query) +6. **Scanner → Policy:** HTTP POST (policy eval) +7. **Scanner → Signer:** HTTP POST mTLS (DSSE signing) +8. **Scanner → Attestor:** HTTP POST (Rekor submission) +9. **Scanner → Valkey:** XADD (event publish) +10. **Notify → Valkey:** XREADGROUP (event consume) +11. **Notify Worker → Slack:** HTTP POST (delivery) + +--- + +### 2. Advisory Update Flow + +``` +Concelier.Worker (cron: every hour) + │ + │ 1. Fetch NVD CVE JSON feed + │ HTTPS GET https://services.nvd.nist.gov/rest/json/cves/2.0 + │ + │ 2. Parse and normalize + │ 3. POST to Concelier.WebService + │ POST /internal/ingest + ▼ +Concelier.WebService + │ + │ 1. Validate advisory format + │ 2. Compute linksets (CVE → PURL/CPE) + │ 3. INSERT INTO vuln.advisory_raw (AOC: append-only) + │ 4. Detect delta (new CVEs) + │ 5. Webhook POST to Scheduler + │ POST /webhooks/concelier + │ { + │ "cveIds": ["CVE-2024-1234"], + │ "timestamp": "2025-12-23T12:00:00Z" + │ } + │ + ▼ +Scheduler.WebService + │ + │ 1. Query BOM-Index for impacted images + │ SELECT DISTINCT image_ref + │ FROM scanner.inventory + │ WHERE purl IN ( + │ SELECT purl FROM vuln.linksets + │ WHERE cve_id IN ('CVE-2024-1234') + │ ) + │ + │ 2. Batch results (max 100 images/run) + │ 3. Enforce rate limits (max 10 scans/min) + │ 4. Enqueue to Scheduler.Worker + │ + ▼ +Scheduler.Worker + │ + │ 1. Claim job from Scheduler + │ 2. For each image: + │ POST /v1/scans + │ { + │ "imageRef": "alpine:latest", + │ "rescan": true, + │ "reason": "advisory-delta" + │ } + │ └─► Scanner.WebService + │ └─► [Standard scan flow] + │ + │ 3. Heartbeat to Scheduler + │ 4. Complete job + ▼ +Scanner.WebService + (Re-scan executes, new report generated) +``` + +**Communication Summary:** +1. **Concelier.Worker → NVD:** HTTPS GET (fetch advisories) +2. **Concelier.Worker → Concelier.Web:** HTTP POST (ingest) +3. **Concelier.Web → PostgreSQL:** INSERT (advisory storage) +4. **Concelier.Web → Scheduler:** HTTP POST webhook (delta event) +5. **Scheduler → PostgreSQL:** SELECT (BOM-Index query for impacted images) +6. **Scheduler.Worker → Scanner:** HTTP POST (re-scan requests) + +--- + +### 3. VEX Update Flow + +``` +Excititor.Worker (cron: every 6 hours) + │ + │ 1. Fetch Red Hat CSAF VEX feed + │ HTTPS GET https://www.redhat.com/security/data/csaf/ + │ + │ 2. Parse CSAF JSON + │ 3. Verify PGP signature + │ 4. POST to Excititor.WebService + │ POST /internal/ingest + ▼ +Excititor.WebService + │ + │ 1. Verify DSSE signature + │ └─► IssuerDirectory.WebService + │ GET /v1/issuers/{issuerId} + │ (Retrieve public key + trust weight) + │ + │ 2. Validate signature with issuer public key + │ 3. INSERT INTO vex.vex_raw (AOC: append-only) + │ 4. Compute consensus (if multiple VEX for same CVE) + │ UPDATE vex.consensus + │ 5. Detect delta (new VEX statements) + │ 6. Webhook POST to Scheduler + │ POST /webhooks/excititor + │ { + │ "cveIds": ["CVE-2024-5678"], + │ "status": "not_affected", + │ "timestamp": "2025-12-23T18:00:00Z" + │ } + │ + ▼ +Scheduler.WebService + │ + │ 1. Query BOM-Index for impacted images + │ (Same as advisory flow, but for VEX changes) + │ + │ 2. Enqueue analysis-only jobs + │ (No full re-scan, just re-evaluate policy with new VEX) + │ + ▼ +Scheduler.Worker + │ + │ For each image: + │ POST /v1/scans/{scanId}/reanalyze + │ └─► Scanner.WebService + │ └─► Policy.Gateway (re-evaluate with new VEX) + ▼ +Scanner.WebService + (Policy re-evaluation, verdict updated) +``` + +**Communication Summary:** +1. **Excititor.Worker → VEX source:** HTTPS GET (fetch VEX) +2. **Excititor.Worker → Excititor.Web:** HTTP POST (ingest) +3. **Excititor.Web → IssuerDirectory:** HTTP GET (trust verification) +4. **Excititor.Web → PostgreSQL:** INSERT (VEX storage) +5. **Excititor.Web → Scheduler:** HTTP POST webhook (delta event) +6. **Scheduler.Worker → Scanner:** HTTP POST (re-analyze request) + +--- + +### 4. Notification Delivery Flow + +``` +Scanner.WebService + (Scan completed, verdict computed) + │ + │ XADD to Valkey Stream + │ events:report.ready + │ { + │ "scanId": "scan-123", + │ "imageRef": "alpine:latest", + │ "verdict": "FAIL", + │ "criticalCount": 3 + │ } + ▼ +Valkey Stream: events:report.ready + │ + │ XREADGROUP (consumer group: notify-delivery) + ▼ +Notify.WebService + │ + │ 1. SELECT channel config from PostgreSQL + │ (Slack, Teams, Email channels) + │ + │ 2. SELECT template from PostgreSQL + │ (Liquid template: "New vulnerabilities found...") + │ + │ 3. Check throttle limits + │ (Max 10 notifications/hour per channel) + │ + │ 4. Render template with event data + │ + │ 5. XADD to Valkey Stream + │ notify:delivery + │ { + │ "channelId": "slack-security", + │ "renderedMessage": "🚨 Critical: 3 vulns in alpine:latest", + │ "deliveryId": "delivery-456" + │ } + ▼ +Valkey Stream: notify:delivery + │ + │ XREADGROUP (consumer group: notify-workers) + ▼ +Notify.Worker + │ + │ 1. Claim delivery job + │ 2. Check idempotency (deliveryId seen before?) + │ 3. HTTP POST to Slack API + │ POST https://slack.com/api/chat.postMessage + │ { + │ "channel": "#security", + │ "text": "🚨 Critical: 3 vulns in alpine:latest", + │ "attachments": [...] + │ } + │ + │ 4. Record delivery in PostgreSQL + │ INSERT INTO notify.delivery_history + │ 5. XACK to Valkey (mark complete) + ▼ +Slack Channel #security +``` + +**Communication Summary:** +1. **Scanner → Valkey:** XADD (publish event) +2. **Notify.Web → Valkey:** XREADGROUP (consume event) +3. **Notify.Web → PostgreSQL:** SELECT (channel config, template) +4. **Notify.Web → Valkey:** XADD (queue delivery job) +5. **Notify.Worker → Valkey:** XREADGROUP (consume delivery job) +6. **Notify.Worker → Slack API:** HTTP POST (deliver notification) +7. **Notify.Worker → PostgreSQL:** INSERT (delivery history) +8. **Notify.Worker → Valkey:** XACK (acknowledge completion) + +--- + +### 5. Policy Evaluation Flow + +``` +Scanner.WebService + (Scan completed, SBOM generated) + │ + │ POST /v1/policy/evaluate + │ { + │ "scanId": "scan-123", + │ "findings": [ + │ { + │ "cveId": "CVE-2024-1234", + │ "purl": "pkg:alpine/openssl@3.0.1", + │ "severity": "CRITICAL", + │ "reachability": "REACHABLE" + │ } + │ ] + │ } + ▼ +Policy.Gateway + │ + │ 1. SELECT exceptions from PostgreSQL + │ (Check for approved false positives) + │ + │ 2. POST /v1/policy/eval to Policy Engine + │ └─► Policy Engine (OPA/Rego) + │ │ + │ │ Data sources: + │ ├─► PostgreSQL logical replication + │ │ • advisory_raw_stream (advisory data) + │ │ • vex_raw_stream (VEX data) + │ │ + │ │ Policy rules: + │ ├─► unknowns-budget.rego + │ │ (Limit unresolved CVEs to max 10) + │ ├─► severity-gates.rego + │ │ (Block CRITICAL, allow HIGH with approval) + │ ├─► reachability-gates.rego + │ │ (Allow UNREACHABLE findings) + │ └─► vex-override.rego + │ (If VEX status = not_affected, allow) + │ + │ 3. Return verdict + │ { + │ "verdict": "FAIL", + │ "blockedFindings": [ + │ { + │ "cveId": "CVE-2024-1234", + │ "reason": "CRITICAL severity + REACHABLE" + │ } + │ ], + │ "allowedFindings": [ + │ { + │ "cveId": "CVE-2024-5678", + │ "reason": "VEX not_affected" + │ } + │ ] + │ } + ▼ +Scanner.WebService + │ + │ 1. Store verdict in PostgreSQL + │ UPDATE scanner.scan_manifests + │ SET verdict = 'FAIL' + │ 2. Return to caller + ▼ +CLI/UI +``` + +**Communication Summary:** +1. **Scanner → Policy.Gateway:** HTTP POST (policy eval request) +2. **Policy.Gateway → PostgreSQL:** SELECT (exceptions) +3. **Policy.Gateway → Policy Engine:** HTTP POST (OPA eval) +4. **Policy Engine → PostgreSQL:** Logical replication read (advisory/VEX data) +5. **Policy.Gateway → Scanner:** HTTP 200 (verdict response) +6. **Scanner → PostgreSQL:** UPDATE (store verdict) + +--- + +## Database Schema Isolation + +Each service has a dedicated PostgreSQL schema for strict isolation: + +### authority + +**Owner:** Authority.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `users` | User accounts (LDAP-synced or local) | +| `clients` | OAuth2 clients (service accounts) | +| `tenants` | Multi-tenant organization data | +| `keys` | Signing keys (JWK format) | +| `tokens` | OpTok refresh tokens (rotation-protected) | +| `audit_log` | Authentication/authorization events | +| `dpop_nonces` | (Migrated to Valkey for performance) | + +**Indexes:** +- `users.email` (unique) +- `clients.client_id` (unique) +- `tenants.slug` (unique) +- `audit_log.timestamp, tenant_id` (composite) + +### scanner + +**Owner:** Scanner.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `scan_manifests` | Scan metadata, status, verdicts | +| `proof_bundles` | DSSE envelopes + Rekor receipts | +| `triage` | False positives, waiver approvals | +| `epss` | EPSS scores (daily refresh from FIRST.org) | +| `cg_node` | Call graph nodes (functions) | +| `cg_edge` | Call graph edges (function calls) | +| `inventory` | Package inventory (PURL → scan mapping) | + +**Indexes:** +- `scan_manifests.image_ref, created_at` (composite) +- `inventory.purl` (GIN index for LIKE queries) +- `cg_node.function_signature` (unique) +- `cg_edge.source_id, target_id` (composite) + +### vuln + +**Owner:** Concelier.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `advisory_raw` | Immutable advisory documents (AOC) | +| `linksets` | CVE → PURL/CPE mappings with version ranges | +| `observations` | Merge conflicts, priority overrides | + +**Logical Replication:** +- `advisory_raw_stream` → Policy Engine (tenant-scoped) + +**Indexes:** +- `advisory_raw.cve_id` (GIN array index) +- `linksets.cve_id, purl` (composite) + +### vex + +**Owner:** Excititor.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `vex_raw` | Immutable VEX statements (AOC) | +| `consensus` | Resolved VEX status (weighted voting) | +| `provider_state` | Last-fetch timestamps per VEX source | + +**Logical Replication:** +- `vex_raw_stream` → Policy Engine (tenant-scoped) + +**Indexes:** +- `vex_raw.cve_id, issuer_id` (composite) +- `consensus.cve_id` (unique) + +### scheduler + +**Owner:** Scheduler.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `graph_jobs` | Re-scan job definitions (advisory/VEX delta) | +| `runs` | Job run instances (status, progress) | +| `schedules` | Cron schedules for periodic scans | +| `impact_snapshots` | BOM-Index query results (cached) | + +**Indexes:** +- `runs.job_id, created_at` (composite) +- `impact_snapshots.cve_id` (GIN array index) + +### notify + +**Owner:** Notify.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `channels` | Slack, Teams, Email, Webhook configs | +| `templates` | Liquid templates for notifications | +| `delivery_history` | Sent notifications (idempotency, SLO tracking) | +| `digest_state` | Digest accumulation (hourly/daily batches) | + +**Indexes:** +- `delivery_history.delivery_id` (unique) +- `delivery_history.channel_id, created_at` (composite) + +### policy + +**Owner:** Policy.Gateway + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `exception_objects` | Approved false positives, waivers | +| `snapshots` | Policy baseline snapshots for delta | +| `unknowns` | Unresolved CVEs (no fix available) | + +**Indexes:** +- `exception_objects.cve_id, image_ref` (composite) +- `unknowns.cve_id` (unique) + +### orchestrator + +**Owner:** Orchestrator.WebService + +**Tables:** + +| Table | Purpose | +|-------|---------| +| `sources` | Job sources (Git repos, webhooks) | +| `runs` | Orchestrated run instances | +| `jobs` | Individual jobs within runs | +| `dags` | Job dependency graphs | +| `pack_runs` | Atomic multi-job bundles | + +**Indexes:** +- `jobs.run_id, status` (composite) +- `dags.parent_job_id, child_job_id` (composite) + +--- + +## Security Boundaries + +### Authentication & Authorization + +**All services** enforce: +1. **JWT Validation:** OpTok signature verification (RS256/ES256) +2. **DPoP Verification:** Sender constraint validation (RFC 9449) +3. **Scope-Based Access:** RBAC claims in OpTok (`scan:read`, `policy:write`, etc.) +4. **Tenant Isolation:** All queries filtered by `tenant_id` from OpTok + +**Authority Hard Gates:** +- DPoP nonce must be unused (30s TTL in Valkey) +- OpTok expiry < 15 minutes from issue +- mTLS certificate must match client_id + +**Signer Hard Gates:** +- PoE (Proof of Entitlement) must be valid license +- Scanner image digest must be cosign-signed by Stella Ops +- OpTok must have `sign:dsse` scope + +### Network Segmentation + +**Production Deployment:** + +``` +┌─────────────────────────────────────────────────────────────┐ +│ PUBLIC INTERNET │ +└──────────────────────┬──────────────────────────────────────┘ + │ + │ HTTPS (TLS 1.3) + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ LOAD BALANCER / WAF │ +│ • Rate limiting (IP-based) │ +│ • DDoS protection │ +│ • TLS termination │ +└──────────────────────┬──────────────────────────────────────┘ + │ + │ Internal HTTP + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ DMZ - Gateway Layer │ +│ ┌────────────────────────────────────────┐ │ +│ │ Gateway.WebService │ │ +│ │ • JWT + DPoP validation │ │ +│ │ • Tenant resolution │ │ +│ └────────────────────────────────────────┘ │ +└──────────────────────┬──────────────────────────────────────┘ + │ + │ Internal mTLS (optional) + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ APPLICATION LAYER (Internal) │ +│ • Scanner.WebService │ +│ • Concelier.WebService │ +│ • Policy.Gateway │ +│ • Scheduler.WebService │ +│ • Notify.WebService │ +│ • Orchestrator.WebService │ +│ │ +│ Network Policy: Only Gateway can initiate connections │ +└──────────────────────┬──────────────────────────────────────┘ + │ + │ PostgreSQL protocol (TLS) + │ Valkey protocol (TLS optional) + │ S3 API (HTTPS) + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ DATA LAYER (Isolated Subnet) │ +│ • PostgreSQL (private IP only) │ +│ • Valkey (private IP only) │ +│ • RustFS (private IP only) │ +│ │ +│ Network Policy: No outbound internet, inbound from app │ +└─────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────┐ +│ PRIVILEGED SERVICES (Separate Subnet) │ +│ • Authority (TLS 8440) │ +│ • Signer (mTLS 8441) │ +│ • Attestor (HTTPS 8442) │ +│ │ +│ Network Policy: mTLS required, audit all access │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Data Encryption + +**At Rest:** +- PostgreSQL: Transparent Data Encryption (TDE) or LUKS full-disk +- Valkey: No encryption (ephemeral data only, 30s max TTL for DPoP nonces) +- RustFS: Server-side encryption (SSE-S3 or AES-256) + +**In Transit:** +- External: TLS 1.3 (Gateway → clients) +- Internal: Optional mTLS (Gateway → services) +- PostgreSQL: TLS (required in production) +- Valkey: TLS optional (recommend enabled) +- RustFS: HTTPS (required) + +### Audit Logging + +**All services log to PostgreSQL:** + +| Event | Service | Table | +|-------|---------|-------| +| Authentication | Authority | `authority.audit_log` | +| Authorization denials | Gateway | `authority.audit_log` | +| DSSE signing | Signer | `authority.audit_log` (via OpTok validation) | +| Policy exceptions | Policy.Gateway | `policy.exception_objects` (approval trail) | +| Scan triggers | Scanner | `scanner.scan_manifests` (audit columns) | + +**Audit Trail Requirements (SOC 2):** +- Who (user/client ID) +- What (action performed) +- When (ISO 8601 timestamp) +- Where (tenant ID, IP address) +- Result (success/failure, reason) + +**Retention:** +- Audit logs: 90 days minimum (configurable per tenant) +- Compliance mode: 7 years retention for regulated industries + +--- + +## Summary + +**Key Architectural Principles:** + +1. **Schema Isolation:** Each service owns its PostgreSQL schema, no cross-schema foreign keys +2. **Event-Driven:** Valkey Streams for async communication (scan jobs, notifications) +3. **Webhook Integration:** Concelier/Excititor → Scheduler for delta events +4. **Append-Only Data:** AOC for advisories and VEX (immutable, audit-friendly) +5. **Strong Authentication:** JWT + DPoP for all API calls, OpTok for service-to-service +6. **Hard Gates:** Signer enforces licensing and scanner authenticity +7. **Multi-Tenancy:** Tenant ID in all data, tenant-scoped logical replication +8. **Transparency:** Rekor v2 for public auditability, offline bundles for airgap + +**Communication Patterns:** + +| Pattern | Technology | Use Case | +|---------|------------|----------| +| Synchronous HTTP | REST APIs | Scanner → Concelier linkset queries | +| Asynchronous Queue | Valkey Streams | Scanner jobs, Notify delivery | +| Event Publishing | Valkey Streams | `report.ready`, `drift.detected` | +| Webhooks | HTTP POST | Concelier/Excititor → Scheduler | +| Database Replication | PostgreSQL Logical Replication | Policy Engine advisory/VEX data | +| Object Storage | S3 API (RustFS) | SBOM artifacts, proof bundles | + +**Security Model:** +- **Gateway:** Enforces authentication, authorization, rate limiting +- **Authority:** Issues OpToks with DPoP binding (sender constraint) +- **Signer:** Hard gates on PoE and scanner authenticity +- **Tenant Isolation:** All queries filtered by `tenant_id` +- **Audit Trails:** All privileged actions logged to PostgreSQL + +This architecture provides **deterministic, reproducible vulnerability scanning** with **strong cryptographic provenance** (DSSE + Rekor), **multi-tenant isolation**, and **VEX-first decisioning** for exploitability analysis. + +--- + +**For More Information:** +- [Developer Onboarding](./DEVELOPER_ONBOARDING.md) - Quick start guide +- [High-Level Architecture](./07_HIGH_LEVEL_ARCHITECTURE.md) - Business-level overview +- [API/CLI Reference](./09_API_CLI_REFERENCE.md) - Endpoint documentation +- [Offline Kit](./24_OFFLINE_KIT.md) - Airgap deployment guide diff --git a/docs/DEVELOPER_ONBOARDING.md b/docs/DEVELOPER_ONBOARDING.md index f4f20588a..a91fff019 100644 --- a/docs/DEVELOPER_ONBOARDING.md +++ b/docs/DEVELOPER_ONBOARDING.md @@ -19,7 +19,62 @@ StellaOps is a deterministic SBOM + VEX platform built as a microservices architecture with 36+ services organized into functional domains. -### Runtime Topology - High-Level +**📖 For detailed component architecture with communication patterns, see [ARCHITECTURE_DETAILED.md](./ARCHITECTURE_DETAILED.md)** + +### Quick Reference - Component Topology + +``` +CLIENT LAYER +├─ stella CLI → Gateway (JWT + DPoP auth) +├─ Web UI (Angular) → Gateway (JWT + DPoP auth) +├─ CI/CD Pipelines → Gateway (JWT + DPoP auth) +└─ Zastava Observer → Scanner (runtime scans) + +INFRASTRUCTURE (REQUIRED) +├─ PostgreSQL v16+ → Primary database (ALL services) +├─ Valkey v8.0 → Cache, DPoP, queues, events +└─ RustFS → Object storage (S3 API) + +INFRASTRUCTURE (OPTIONAL) +└─ NATS JetStream → Alternative messaging (Valkey is default) + +GATEWAY LAYER +└─ Gateway.WebService → Auth, routing, rate limiting + +AUTH & CRYPTO +├─ Authority → OAuth2/OIDC, OpTok issuance +├─ Signer → DSSE signing (FIPS/GOST/SM) +└─ Attestor → Rekor v2 transparency log + +CORE ENGINES +├─ Scanner.WebService → Scan orchestration +├─ Scanner.Worker → Image analysis, SBOM generation +├─ Concelier.WebService → Advisory ingestion (NVD, Red Hat, etc.) +├─ Excititor.WebService → VEX ingestion + consensus +├─ Policy.Gateway → OPA/Rego policy evaluation +├─ Scheduler.WebService → Re-scan orchestration +├─ Notify.WebService → Notification orchestration +├─ Notify.Worker → Slack/Teams/Email delivery +└─ Orchestrator.WebService → DAG workflows, pack runs + +SUPPORTING +└─ IssuerDirectory → VEX issuer trust registry +``` + +### Service Categories + +| Category | Services | Purpose | +|----------|----------|---------| +| **Gateway** | Gateway.WebService | API routing, auth enforcement | +| **Auth & Security** | Authority, Signer, Attestor | OAuth2, signing, transparency | +| **Scanning** | Scanner.Web, Scanner.Worker | Container analysis, SBOM | +| **Advisory** | Concelier.Web, Concelier.Worker | Vulnerability ingestion | +| **VEX** | Excititor.Web, Excititor.Worker | Exploitability statements | +| **Policy** | Policy.Gateway, Policy Engine | OPA/Rego evaluation | +| **Orchestration** | Scheduler, Orchestrator | Job coordination | +| **Notifications** | Notify.Web, Notify.Worker | Delivery to Slack/Teams/Email | + +### Runtime Topology - Infrastructure Dependencies ``` ┌─────────────────────────────────────────────────────────────────────┐ diff --git a/docs/implplan/SPRINT_4300_0002_0001_unknowns_budget_policy.md b/docs/implplan/archived/SPRINT_4300_0002_0001_unknowns_budget_policy.md similarity index 100% rename from docs/implplan/SPRINT_4300_0002_0001_unknowns_budget_policy.md rename to docs/implplan/archived/SPRINT_4300_0002_0001_unknowns_budget_policy.md diff --git a/docs/implplan/SPRINT_4300_0002_0002_unknowns_attestation_predicates.md b/docs/implplan/archived/SPRINT_4300_0002_0002_unknowns_attestation_predicates.md similarity index 100% rename from docs/implplan/SPRINT_4300_0002_0002_unknowns_attestation_predicates.md rename to docs/implplan/archived/SPRINT_4300_0002_0002_unknowns_attestation_predicates.md diff --git a/docs/implplan/SPRINT_4300_0003_0001_sealed_knowledge_snapshot.md b/docs/implplan/archived/SPRINT_4300_0003_0001_sealed_knowledge_snapshot.md similarity index 100% rename from docs/implplan/SPRINT_4300_0003_0001_sealed_knowledge_snapshot.md rename to docs/implplan/archived/SPRINT_4300_0003_0001_sealed_knowledge_snapshot.md