Sprint 3500.0004.0004 (Documentation & Handoff) - T2 DONE Operations Runbooks Added: - score-replay-runbook.md: Deterministic replay procedures - proof-verification-runbook.md: DSSE/Merkle verification ops - airgap-operations-runbook.md: Offline kit management CLI Reference Docs: - reachability-cli-reference.md - score-proofs-cli-reference.md - unknowns-cli-reference.md Air-Gap Guides: - score-proofs-reachability-airgap-runbook.md Training Materials: - score-proofs-concept-guide.md UI API Clients: - proof.client.ts - reachability.client.ts - unknowns.client.ts All 5 operations runbooks now complete (reachability, unknowns-queue, score-replay, proof-verification, airgap-operations).
38 KiB
Executable File
High‑Level Architecture — Stella Ops (Consolidated • 2025Q4)
Want the 10-minute tour? See
high-level-architecture.md; this file retains the exhaustive reference.
Purpose. A complete, implementation‑ready map of Stella Ops: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic. Scope. This file replaces the separate
components.md; all component details now live here.
0) Product vision & principles
Vision. Stella Ops is a deterministic SBOM + VEX platform for CI/CD and runtime, tuned for speed (per‑layer deltas), quiet output (usage‑scoped views), and verifiability (DSSE + Rekor v2). It is self‑hostable, air‑gap capable, and commercially enforceable: only licensed installations can produce Stella Ops‑verified attestations.
Operating principles.
- Scanner‑owned SBOMs. We generate our own BOMs; we do not warehouse third‑party SBOM content (we can link to attested SBOMs).
- Deterministic evidence. Facts come from package DBs, installed metadata, linkers, and verified attestations; no fuzzy guessing in the core.
- Per-layer caching. Cache fragments by layer digest and compose image SBOMs via CycloneDX BOM-Link / SPDX ExternalRef.
- Inventory vs Usage. Always record the full inventory of what exists; separately present usage (entrypoint closure + loaded libs).
- Backend decides. PASS/FAIL is produced by Policy + VEX + Advisories. The scanner reports facts.
- VEX-first triage UX. Operators triage by artifact with evidence-first cards, VEX decisioning, and immutable audit bundles; see
docs/product-advisories/archived/27-Nov-2025-superseded/28-Nov-2025 - Vulnerability Triage UX & VEX-First Decisioning.md. - Attest or it didn't happen. Every export is signed as in-toto/DSSE and logged in Rekor v2.
- Hybrid reachability attestations. Every reachability graph ships with a graph-level DSSE (mandatory) plus optional edge-bundle DSSEs for runtime/init/contested edges; Policy/Signals consume graph DSSE as baseline and edge bundles for quarantine/disputes. See
docs/reachability/hybrid-attestation.mdfor verification runbooks, Rekor guidance, and offline replay steps. - Sovereign-ready. Cloud is used only for licensing and optional endorsement; everything else is first-party and self-hostable.
- Competitive clarity. Moats: deterministic replay, hybrid reachability proofs, lattice VEX, sovereign crypto, proof graph; see
docs/market/competitive-landscape.md.
1) Service topology & trust boundaries
1.1 Runtime inventory (first‑party)
| Service / Tool | Container image | Core role | Scale pattern |
|---|---|---|---|
| Scanner.WebService | stellaops/scanner-web |
Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports; analysis‑only report runs for Scheduler. | Stateless; N replicas behind LB. |
| Scanner.Worker | stellaops/scanner-worker |
Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/Mach‑O, EntryTrace); emits per‑layer SBOMs and composes image SBOMs. | Horizontal; queue‑driven; sharded by layer digest. |
| Scanner.Sbomer.BuildXPlugin | stellaops/sbom-indexer |
BuildKit generator for build‑time SBOMs as OCI referrers. | CI‑side; ephemeral. |
| Scanner.Sbomer.DockerImage | stellaops/scanner-cli |
CLI‑orchestrated scanner container for post‑build scans. | Local/CI; ephemeral. |
| Concelier.WebService | stellaops/concelier-web |
Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via PostgreSQL locks. |
| Excititor.WebService | stellaops/excititor-web |
VEX ingest/normalize/consensus; conflict retention; exports. | HA via PostgreSQL locks. |
| Policy Engine | (in scanner-web) |
YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usage‑gating); produces policy digest. | In‑process; cache per digest. |
| Scheduler.WebService | stellaops/scheduler-web |
Schedules re‑evaluation runs; consumes Concelier/Excititor deltas; selects impacted images via BOM‑Index; orchestrates analysis‑only reports. | Stateless API. |
| Scheduler.Worker | stellaops/scheduler-worker |
Executes selection and enqueues batches toward Scanner; enforces rate/limits and windows; maintains impact cursors. | Horizontal; queue‑driven. |
| Notify.WebService | stellaops/notify-web |
Rules engine for outbound notifications; manages channels, templates, throttle/digest logic. | Stateless API. |
| Notify.Worker | stellaops/notify-worker |
Delivers to Slack/Teams/Email/Webhooks; idempotent retries; digests. | Horizontal; per‑channel rate limits. |
| Signer | stellaops/signer |
Hard gate: validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
| Attestor | stellaops/attestor |
Posts DSSE bundles to Rekor v2; verification endpoints. | Stateless; HPA by QPS. |
| Authority | stellaops/authority |
On‑prem OIDC issuing short‑lived OpToks with DPoP/mTLS sender constraint. | HA behind LB. |
| Zastava (Runtime) | stellaops/zastava |
Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
| Web UI | stellaops/ui |
Angular app for scans, diffs, policy, VEX, vulnerability triage (artifact-first), audit bundles, Scheduler, Notify, runtime, reports. | Stateless. |
| StellaOps.Cli | stellaops/cli |
CLI for init/scan/export/diff/policy/report/verify; Buildx helper; schedule and notify verbs. | Local/CI. |
1.2 Third‑party (self‑hosted)
- Fulcio (Sigstore CA) — issues short‑lived signing certs (keyless).
- Rekor v2 (tile‑backed transparency log).
- RustFS — offline-first object store with deterministic REST API (S3/MinIO fallback available for legacy installs).
- PostgreSQL (≥16) — primary control-plane storage with per-module schema isolation (authority, vuln, vex, scheduler, notify, policy, concelier). See Database Architecture.
- Queue — Redis Streams / NATS / RabbitMQ (pluggable).
- OCI Registry — must support Referrers API (discover SBOMs/signatures).
1.3 Cloud licensing (Stella Ops)
- Licensing Service (
www.stella-ops.org) — issues long‑lived License Tokens (LT); exchanges LT → Proof‑of‑Entitlement (PoE) bound to an installation key; revoke/introspect PoE; optional cross‑log endorsement.
1.4 Diagram (control/data planes & trust)
flowchart LR
subgraph Cloud["www.stella-ops.org (Cloud)"]
LS[Licensing Service<br/>LT→PoE / revoke / introspect]
end
subgraph OnPrem["Customer Site (Self-hosted)"]
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
SW[Scanner.WebService]
WK[Scanner.Worker xN]
CONC[Concelier]
EXC[Excititor]
SCHW[Scheduler.Web]
SCH[Scheduler.Worker xN]
NOTW[Notify.Web]
NOT[Notify.Worker xN]
POL[Policy Engine (in Scanner.Web)]
SGN[Signer\n(entitlement + signing)]
ATT[Attestor\n(Rekor v2 submit/verify)]
UI[Web UI (Angular)]
Z[Zastava\n(Runtime Inspector/Enforcer)]
RFS[(RustFS object store)]
PG[(PostgreSQL)]
QUE[(Queue/Streams)]
end
CLI[StellaOps.Cli / Buildx Plugin]
REG[(OCI Registry with Referrers)]
FUL[ Fulcio ]
REK[ Rekor v2 (tiles) ]
CLI -->|scan/build| SW
SW -->|jobs| QUE
QUE --> WK
WK --> RFS
SW --> PG
CONC --> PG
EXC --> PG
UI --> SW
Z --> SW
%% New event-driven loop
CONC -- export.delta --> SCHW
EXC -- export.delta --> SCHW
SCHW --> SCH
SCH --> SW
SW -- report.ready --> NOTW
Z -- admission/observe --> NOTW
SGN <--> Auth
SGN --> FUL
SGN -->|mTLS| ATT
ATT --> REK
SGN <-->|verify referrers| REG
Trust boundaries. Only Signer can sign; only Attestor can write to Rekor v2. Scanner/UI/Scheduler/Notify never sign.
2) Licensing & tokens (installation‑ready, theft‑resistant)
Two‑token model.
- License Token (LT) — long‑lived JWT from Licensing Service; used once to enroll the installation; never used in hot path.
- Proof‑of‑Entitlement (PoE) — bound to the installation key (mTLS client cert or DPoP‑bound JWT with
cnf); medium‑lived; renewable; revocable. - Operational token (OpTok) — 2–5 min OIDC token from Authority, sender‑constrained (DPoP or mTLS). Used to authenticate to Signer/Scanner.WebService/Scheduler.Web/Notify.Web.
Signer enforces both: PoE proves entitlement; OpTok proves “who is calling now”. It also independently verifies the scanner image digest is Stella Ops‑signed via Referrers + cosign before signing anything.
Enrollment sequence (LT → PoE).
@startuml
actor Operator
participant "Install Agent" as IA
participant "Licensing Service" as LS
Operator -> IA: Provide LT
IA -> IA: Generate K_inst
IA -> LS: /license/enroll {LT, pub(K_inst)}
LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
@enduml
3) Scanner subsystem (facts engine)
3.1 Analyzers (deterministic only)
-
OS packages: apk/dpkg/rpm (Linux); Windows MSI/SxS/GAC (M2).
-
Language (installed state):
- Java (pom.properties / MANIFEST) →
pkg:maven/... - Node (
node_modules/*/package.json) →pkg:npm/... - Python (
*.dist-info/METADATA) →pkg:pypi/... - Go (buildinfo) →
pkg:golang/... - .NET (
*.deps.json) →pkg:nuget/... - Rust: deterministic language markers (symbol mangling) and crates only when present; otherwise
bin:{sha256}.
- Java (pom.properties / MANIFEST) →
-
Native: ELF/PE/Mach‑O imports, DT_NEEDED, RPATH/RUNPATH, symbol versions, PE version info.
-
EntryTrace: parse
ENTRYPOINT/CMD; shell AST; resolve launchers (Java/Node/Python) to terminal program; record file:line chain.
3.2 Caching & composition
-
Layer cache:
{layerDigest → SBOM fragment + analyzer meta}. -
File CAS:
{sha256(file) → parse result (ELF/JAR metadata/etc.)}. -
Composition: build image SBOMs from fragments via BOM‑Link/ExternalRef; emit two views:
- Inventory (complete filesystem inventory).
- Usage (entrypoint closure + linked libs).
-
Transport: JSON and CycloneDX Protobuf (compact, fast to parse).
-
Index: BOM‑Index sidecar with purl table + roaring bitmap +
usedByEntrypointflag for fast joins.
3.3 Diff (image → layer → package)
- Added / Removed / Version‑changed changes, attributed to the layer that caused them.
- Raw diffs preserved; backend view applies VEX + Policy.
3.4 Build‑time SBOMs (fast CI path)
- Buildx generator runs analyzers during
docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer, attaches SBOMs as OCI referrers. - Scanner.WebService can trust these (policy‑configurable) and skip re‑scan; DSSE + Rekor v2 can be done either at build time or post‑push via Signer/Attestor.
3.5 Events / integrations
- Out:
report.ready(summary + verdict + Rekor UUID) → internal bus for Notify & UI. - Expose: image‑level BOM‑Index metadata for Scheduler impact selection.
4) Backend evaluation (decider)
4.1 Concelier (advisories)
- Ingests vendor, distro, OSS feeds; normalizes & merges; persists canonical advisories in PostgreSQL; exports deterministic JSON and Trivy DB.
- Offline kit bundles for air‑gapped sites.
4.2 Excititor (VEX)
- Ingests OpenVEX / CSAF VEX / CycloneDX VEX; normalizes claims; retains conflicts; computes consensus with provider trust weights and justification gates.
4.3 Policy Engine (YAML DSL)
- Matchers:
image/repo/env/purl/cve/vendor/source/path/layerDigest/usedByEntrypoint - Actions:
ignore(until, justification),fail,warn,defer,requireVEX{vendors, justifications},escalate {sev, KEV, EPSS}, license constraints. - Produces a policy digest (SHA‑256 of canonicalized policy).
4.4 PASS/FAIL flow
- SBOM (Inventory / Usage) → join with Concelier advisories.
- Apply Excititor consensus (statuses & justifications).
- Apply Policy; compute PASS/FAIL with waiver TTLs.
- Sign the final report (DSSE via Signer) and log to Rekor v2 via Attestor.
4A) Score Proofs & Deterministic Replay
4A.1 Overview
Score Proofs provide cryptographically verifiable audit trails for every scoring decision. They enable:
- Deterministic replay: Same inputs → same outputs, every time
- Audit compliance: Full traceability from inputs to final scores
- Offline verification: Proof bundles verifiable without network access
- Feed updates: Re-score historical scans with new advisories
4A.2 Scan Manifest
Every scan captures its inputs deterministically:
{
"scanId": "550e8400-e29b-41d4-a716-446655440000",
"createdAtUtc": "2025-12-17T12:00:00Z",
"artifactDigest": "sha256:abc123...",
"artifactPurl": "pkg:oci/myapp@sha256:abc123...",
"scannerVersion": "1.0.0",
"workerVersion": "1.0.0",
"concelierSnapshotHash": "sha256:feed123...",
"excititorSnapshotHash": "sha256:vex456...",
"latticePolicyHash": "sha256:policy789...",
"deterministic": true,
"seed": "AQIDBA==",
"knobs": {"maxDepth": "10"}
}
4A.3 Proof Ledger (DAG)
Scoring computation is recorded as a directed acyclic graph of ProofNode:
| Field | Description |
|---|---|
id |
Node identifier |
kind |
Input, Transform, Delta, Score |
ruleId |
Policy rule that produced this node |
parentIds |
Nodes this depends on |
evidenceRefs |
Links to supporting evidence |
delta |
Score contribution at this step |
total |
Cumulative score |
nodeHash |
Content-addressed hash |
The proof root hash is computed by hashing all leaf nodes' hashes in deterministic order.
4A.4 Score Replay API
POST /api/v1/scanner/scans/{id}/score/replay
{ overrides?: { concelierSnapshotHash?, excititorSnapshotHash?, latticePolicyHash? } }
→ { scoreProof, rootHash, proofBundleUri }
Use cases:
- Feed updates: Re-score when Concelier publishes new advisories
- Policy changes: See impact of policy modifications
- Audit: Reproduce historical scores for compliance
4A.5 Proof Bundle Format
Proof bundles are self-contained ZIP archives:
proof-bundle.zip/
├── manifest.json # Canonical scan manifest
├── manifest.dsse.json # DSSE signature
├── score_proof.json # ProofNode[] array
├── proof_root.dsse.json # DSSE of proof root
└── meta.json # Timestamps, versions
Storage: scanner.proof_bundle table + RustFS for bundle files.
4B) Reachability Analysis
4B.1 Overview
Reachability Analysis determines whether vulnerable code is actually reachable from application entrypoints, reducing false positives by filtering unreachable vulnerabilities.
4B.2 Call Graph Ingestion
Language-specific workers extract call graphs:
{
"schema": "stella.callgraph.v1",
"language": "dotnet",
"nodes": [...], // Function definitions
"edges": [...], // Call relationships
"entrypoints": [...] // HTTP routes, gRPC, etc.
}
Supported languages: .NET, Java, Node.js, Python, Go, Rust
4B.3 Reachability Statuses
| Status | Confidence | Description |
|---|---|---|
UNREACHABLE |
High | No path from entrypoints to vulnerable code |
POSSIBLY_REACHABLE |
Medium | Path exists with heuristic edges |
REACHABLE_STATIC |
High | Static analysis proves path exists |
REACHABLE_PROVEN |
Very High | Runtime evidence confirms |
UNKNOWN |
Low | Insufficient data |
4B.4 Reachability API
POST /api/v1/scanner/scans/{id}/callgraphs
CallGraph → { callGraphDigest, status }
POST /api/v1/scanner/scans/{id}/reachability/compute
→ { jobId, status }
GET /api/v1/scanner/scans/{id}/reachability/findings
→ { findings[], summary }
GET /api/v1/scanner/scans/{id}/reachability/explain?cve=...&purl=...
→ { status, confidence, shortestPath[], whyReachable[] }
4B.5 Integration with Score Proofs
Reachability evidence is included in proof bundles:
- Reachability status per CVE/PURL is a scoring input
- Path evidence is referenced in proof nodes
- Graph attestations (DSSE) link to score proofs
Storage: scanner.cg_node, scanner.cg_edge, scanner.entrypoint tables.
4C) Unknowns Registry
4C.1 Overview
The Unknowns Registry tracks items that could not be fully classified due to missing evidence, enabling prioritized triage.
4C.2 Unknown Reasons
| Code | Description |
|---|---|
missing_vex |
No VEX statement for vulnerability |
ambiguous_indirect_call |
Indirect call target unresolved |
incomplete_sbom |
SBOM missing component data |
missing_advisory |
No advisory data for CVE |
conflicting_evidence |
Multiple conflicting data sources |
4C.3 2-Factor Ranking Model
Unknowns are ranked by:
score = 0.60 × blast + 0.30 × scarcity + 0.30 × pressure + containment_deduction
| Factor | Weight | Components |
|---|---|---|
| Blast Radius | 0.60 | Dependents, network exposure, privilege |
| Evidence Scarcity | 0.30 | Missing data severity |
| Exploit Pressure | 0.30 | EPSS, KEV status |
| Containment | -0.20 | Seccomp, read-only FS |
4C.4 Band Assignment
| Band | Score Range | SLA |
|---|---|---|
| HOT | ≥ 0.70 | 24 hours |
| WARM | 0.40 - 0.69 | 7 days |
| COLD | < 0.40 | 30 days |
4C.5 Unknowns API
GET /api/v1/unknowns
?band=HOT&sort=score → { items[], pagination }
GET /api/v1/unknowns/{id}
→ { id, reasons, blastRadius, score, scoreBreakdown }
GET /api/v1/unknowns/{id}/proof
→ { nodes[], rootHash }
POST /api/v1/unknowns/{id}/escalate
→ { rescanJobId, status }
POST /api/v1/unknowns/{id}/resolve
{ resolution, justification } → { resolvedAt }
Storage: policy.unknowns table with ranking metadata.
5) Runtime enforcement (Zastava)
- Observer: inventories running containers, checks image signatures, SBOM presence (referrers), detects drift (entrypoint chain divergence), flags unapproved images.
- Admission Webhook (optional): blocks policy‑fail pods (dry‑run first).
- Integration: posts runtime events to Scanner.WebService; can request delta scans on changed layers.
6) Storage & catalogs (RustFS/PostgreSQL)
RustFS layout (default)
rustfs://stellaops/
layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin
attest/<artifactSha256>.dsse.json
Database Architecture (PostgreSQL)
StellaOps uses PostgreSQL for all control-plane data with per-module schema isolation. Each module owns and manages only its own schema, ensuring clear ownership and independent migration lifecycles.
Schema topology:
┌─────────────────────────────────────────────────────────────────┐
│ PostgreSQL Cluster │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ stellaops (database) ││
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││
│ │ │ auth │ │ vuln │ │ vex │ │scheduler│ ││
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ││
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││
│ │ │ notify │ │ policy │ │ audit │ ││
│ │ └─────────┘ └─────────┘ └─────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
Schema ownership:
| Schema | Owner Module | Purpose |
|---|---|---|
auth |
Authority | Identity, authentication, authorization, licensing, sessions |
vuln |
Concelier | Vulnerability advisories, CVSS, affected packages, sources |
vex |
Excititor | VEX statements, graphs, observations, evidence, consensus |
scheduler |
Scheduler | Jobs, triggers, workers, locks, execution history |
notify |
Notify | Channels, templates, rules, deliveries, escalations |
policy |
Policy | Policy packs, rules, risk profiles, evaluations |
audit |
Shared | Cross-cutting audit log (optional) |
Key design principles:
- Module isolation — Each module controls only its own schema. Cross-schema queries are rare and explicitly documented.
- Multi-tenancy — Single database, single schema set,
tenant_idcolumn on all tenant-scoped tables with row-level security. - Forward-only migrations — No down migrations; fixes are applied as new forward migrations.
- Advisory lock coordination — Startup migrations use
pg_try_advisory_lock(hashtext('schema_name'))to prevent concurrent execution. - Air-gap compatible — All migrations embedded in assemblies, no external network dependencies.
Migration categories:
| Category | Prefix | Execution | Description |
|---|---|---|---|
| Startup (A) | 001-099 |
Automatic at boot | Non-breaking DDL (CREATE IF NOT EXISTS, ADD COLUMN nullable) |
| Release (B) | 100-199 |
Manual via CLI | Breaking changes (DROP, ALTER TYPE), require maintenance window |
| Seed | S001-S999 |
After schema | Reference data with ON CONFLICT DO NOTHING |
| Data (C) | DM001-DM999 |
Background job | Batched data transformations, resumable |
Detailed documentation: See docs/db/ for full specification, coding rules, and phase-by-phase conversion tasks.
Operations guide: See docs/operations/postgresql-guide.md for performance tuning, monitoring, backup/restore, and scaling.
Retention
- RustFS applies retention via
X-RustFS-Retain-Seconds; Scanner.WebService GC decrementsrefCountand deletes unreferenced metadata; S3/MinIO fallback retains native Object Lock when enabled. - PostgreSQL retention managed via time-based partitioning for high-volume tables (runs, execution_logs) with monthly partition drops.
7) APIs (consolidated surface)
7.1 Scanner.WebService
POST /api/scans { imageRef|digest, force? } → { scanId }
GET /api/scans/{id} → { status, digests, artifacts[] }
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
POST /api/reports { imageDigest, policyRevision?, vexSnapshot? } → { reportId, verdict, rekorUrl }
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
GET /healthz | /readyz | /metrics
7.2 Signer (mTLS; hard gate)
POST /sign/dsse # body: {subjectHash, imageDigest, predicate}; headers: OpTok (DPoP/mTLS) + PoE
GET /verify/referrers?imageDigest=sha256:... # is this image StellaOps-signed?
7.3 Attestor (mTLS)
POST /rekor/entries # DSSE bundle → {uuid, index, proof, logURL}
GET /rekor/entries/{uuid}
7.4 Authority (OIDC)
/.well-known/openid-configuration,/oauth/token(DPoP/mTLS),/oauth/introspect,/jwks
7.5 Licensing (cloud)
POST /license/enroll { LT, pubKey } → PoE + introspection endpoints
POST /license/revoke { license_id } → ok
POST /license/introspect { poe } → { active, claims, exp }
POST /attest/endorse { bundle } → endorsement bundle (optional)
7.6 Scheduler
POST /api/v1/scheduler/schedules {yaml|json} → { scheduleId }
GET /api/v1/scheduler/schedules → [ { id, nextRun, status, stats } ]
POST /api/v1/scheduler/run { id|selector } → { runId }
GET /api/v1/scheduler/runs/{id} → { status, counts, links }
GET /api/v1/scheduler/cursor → { lastConcelierExportId, lastExcititorExportId }
7.7 Notify
POST /api/v1/notify/test { channel, target } → { delivered }
POST /api/v1/notify/rules {yaml|json} → { ruleId }
GET /api/v1/notify/rules → [ { id, match, actions, enabled } ]
GET /api/v1/notify/deliveries → [ { id, eventId, channel, status, attempts } ]
8) Security & verifiability
- Sender‑constrained tokens. All operational calls use DPoP (RFC 9449) or mTLS‑bound tokens (RFC 8705).
- Entitlement. PoE is mandatory; revocation honored online.
- Release integrity. Signer independently verifies scanner image digest via Referrers + cosign before signing.
- Separation of duties. Scanner/UI/Scheduler/Notify cannot sign; only Signer can sign; only Attestor can write to Rekor v2.
- Verifiers. Anyone can verify: DSSE signature → certificate chain to Stella Ops Fulcio/KMS root → Rekor v2 inclusion.
- RBAC. Roles:
scanner.admin|read,scheduler.admin|read,notify.admin|read,zastava.admin|read. - Community vs Authorized. Free/community runs throttled with no official attestations; authorized runs full speed and produce Stella Ops‑verified bundles.
DSSE predicate (SBOM/report)
{
"predicateType": "https://stella-ops.org/attestations/sbom/1",
"subject": [{ "name": "s3://stellaops/images/<digest>/inventory.cdx.pb", "digest": { "sha256": "<sha256>" } }],
"predicate": {
"image_digest": "<sha256:...>",
"stellaops_version": "2.3.1 (2027.04)",
"license_id": "LIC-9F2A...",
"customer_id": "CUST-ACME",
"plan": "pro",
"policy_digest": "sha256:...",
"views": ["inventory","usage"],
"created": "2025-10-17T12:34:56Z"
}
}
BOM‑Index sidecar
Binary header + purl table + roaring bitmaps; optional usedByEntrypoint flags for fast policy joins.
9) Scale, performance & quotas
-
Workers: horizontal; distributed lock per layer digest; global CAS in MinIO.
-
Queues: Redis Streams / NATS / RabbitMQ. HPA by queue depth, CPU, memory.
-
Registry throttling: per‑registry concurrency budgets.
-
Targets:
- Build‑time path P95 ≤ 3–5 s on warmed bases.
- Post‑build delta scan P95 ≤ 10 s for 200 MB images.
- Policy + VEX evaluation ≤ 500 ms for 5k components using BOM‑Index.
- Event → notification p95 ≤ 30–60 s under nominal load.
- Export delta → re‑evaluation verdict p95 ≤ 5 min for 10k impacted images.
-
Quotas: license plan enforces QPS/concurrency/size; Signer throttles and can deny DSSE.
10) DevOps & distribution
-
Releases: all first‑party images cosign‑signed; labels embed
org.stellaops.versionandorg.stellaops.release_date. -
Channels:
- Community (public registry): throttled, non‑attesting.
- Authorized (private registry): full speed, DSSE enabled.
-
Client update flow: containers self‑verify signatures at boot; report version; Signer enforces
valid_release_year/max_versionfrom PoE before signing. -
Compose skeleton:
services:
authority: { image: stellaops/authority, depends_on: [postgres] }
fulcio: { image: sigstore/fulcio }
rekor: { image: sigstore/rekor-v2 }
minio: { image: minio/minio, command: server /data --console-address ":9001" }
postgres: { image: postgres:15-alpine, environment: { POSTGRES_DB: stellaops, POSTGRES_USER: stellaops } }
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
scanner-web: { image: stellaops/scanner-web, depends_on: [postgres, minio, signer, attestor] }
scanner-worker: { image: stellaops/scanner-worker, deploy: { replicas: 4 }, depends_on: [scanner-web] }
concelier: { image: stellaops/concelier-web, depends_on: [postgres] }
excititor: { image: stellaops/excititor-web, depends_on: [postgres] }
scheduler-web: { image: stellaops/scheduler-web, depends_on: [postgres] }
scheduler-worker:{ image: stellaops/scheduler-worker, deploy: { replicas: 2 }, depends_on: [scheduler-web] }
notify-web: { image: stellaops/notify-web, depends_on: [postgres] }
notify-worker: { image: stellaops/notify-worker, deploy: { replicas: 2 }, depends_on: [notify-web] }
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor, scheduler-web, notify-web] }
-
Binary prerequisites (offline-first):
- NuGet packages restore from standard feeds configured in
nuget.config(dotnet-public, nuget-mirror, nuget.org) to the global NuGet cache. For air-gapped environments, usedotnet restore --source <offline-feed-path>pointing to a local.nupkgmirror. - Non-NuGet binaries (plugins/CLIs/tools) are catalogued with SHA-256 in
vendor/manifest.json; air-gap bundles are registered inoffline/feeds/manifest.json. - CI guard:
scripts/verify-binaries.shblocks binaries outside approved roots; offline restores usedotnet restore --source <offline-feed>withOFFLINE=1(override viaALLOW_REMOTE=1).
- NuGet packages restore from standard feeds configured in
-
Backups: PostgreSQL dumps (pg_dump) and WAL archiving; RustFS snapshots (or S3 versioning when fallback driver is used); Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation. See
docs/operations/postgresql-guide.md. -
Ops runbooks: Scheduler catch‑up after Concelier/Excititor recovery; connector key rotation (Slack/Teams/SMTP).
-
SLOs & alerts: lag between Concelier/Excititor export and first rescan verdict; delivery failure rates by channel.
11) Observability & audit
- Metrics: scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
- Scheduler metrics:
scheduler.impacted_images_total,scheduler.jobs_enqueued_total,scheduler.selection_ms, end‑to‑end p95 (event → verdict). - Notify metrics:
notify.sent_total{channel},notify.dropped_total{reason},notify.digest_coalesced_total,notify.latency_ms. - Tracing: per‑stage spans; correlation IDs across Scanner→Signer→Attestor and Concelier/Excititor→Scheduler→Scanner→Notify.
- Audit logs: every signing records
license_id,image_digest,policy_digest, and Rekor UUID; Scheduler records who scheduled what; Notify records where, when, and why messages were sent or deduped. - Compliance: RustFS retention headers (or MinIO Object Lock when operating in S3 mode) keep immutable artifacts tamper‑resistant; reproducible outputs via policy digest + SBOM digest in predicate.
12) Roadmap (anchored to this architecture)
- M2: Windows MSI/SxS/GAC analyzers; deeper Rust (DWARF enrichers).
- M2: Buildx generator certified flows; cross‑registry trust policies.
- M3: Patch‑Presence plugin (signature‑based backport detection), opt‑in.
- M3: Zastava Admission control GA with policy presets and dry‑run→enforce stages.
- M3: Scheduler GA with export‑delta impact routing and capacity‑aware pacing.
- M3: Notify GA with digests, Slack/Teams/Email/Webhooks; M4: PagerDuty/Opsgenie connectors.
- Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.
13) Canonical sequences (verification, re‑evaluation & notify)
Sign & log (OpTok + PoE, image verify, DSSE, Rekor).
sequenceDiagram
autonumber
participant Scan as Scanner.WebService
participant Auth as Authority (OIDC)
participant Sign as Signer
participant Reg as OCI Registry
participant Ful as Fulcio/KMS
participant Att as Attestor
participant Rek as Rekor v2
Scan->>Auth: Get OpTok (DPoP/mTLS)
Scan->>Sign: sign(request) + OpTok + PoE + DPoP proof
Sign->>Auth: Validate OpTok & sender-constraint
Sign->>Sign: Validate PoE (introspect/revocation)
Sign->>Reg: Verify scanner image is StellaOps-signed (Referrers + cosign)
alt OK
Sign->>Ful: Get signing cert (keyless) or use KMS key
Sign-->>Scan: DSSE bundle (cert chain)
Scan->>Att: Submit bundle
Att-->>Rek: Create entry
Rek-->>Att: {uuid,index,proof}
Att-->>Scan: Rekor URL
else Deny
Sign-->>Scan: 403 (no attestation)
end
Event‑driven re‑evaluation & notify.
sequenceDiagram
participant CONC as Concelier
participant EXC as Excititor
participant SCH as Scheduler
participant SC as Scanner.WebService
participant NO as Notify
CONC->>SCH: export.delta {changedProductKeys, exportId}
EXC ->>SCH: export.delta {changedProductKeys, exportId}
SCH->>SCH: Impact select via BOM-Index bitmaps
SCH->>SC: Enqueue analysis-only reports (batches)
SC-->>SCH: verdict stream (PASS/FAIL, deltas)
SCH->>NO: rescan.delta {imageDigest, newCriticals, links}
NO-->>Slack/Teams/Email/Webhook: deliver (throttle/digest rules applied)
14) Minimal data shapes (Scheduler & Notify)
Scheduler schedule (YAML via UI/CLI)
name: nightly-eu
when: "0 2 * * * Europe/Sofia"
mode: analysis-only # or content-refresh
selection:
scope: all-images # or tenant/ns/repo label selectors
onlyIf: { lastReportOlderThanDays: 7 }
notify:
onNewFindings: true
minSeverity: high
limits:
maxJobs: 5000
ratePerSecond: 50
Notify rule (YAML)
name: high-critical-alerts
match:
eventKinds: ["report.ready","rescan.delta","zastava.admission"]
minSeverity: high
namespaces: ["prod-*"]
vex: { includeAcceptedJustifications: false }
actions:
- channel: slack
target: "#sec-alerts"
template: "concise"
throttle: "5m"
- channel: email
target: "soc@acme.org"
digest: "hourly"
enabled: true