FUll implementation plan (first draft)
This commit is contained in:
@@ -1,8 +1,3 @@
|
||||
Below is the **revised, consolidated** `high_level_architecture.md`.
|
||||
It **absorbs** all content from `components.md` so you have a single, authoritative file. No separate components doc is required.
|
||||
|
||||
---
|
||||
|
||||
# High‑Level Architecture — **Stella Ops** (Consolidated • 2025Q4)
|
||||
|
||||
> **Purpose.** A complete, implementation‑ready map of Stella Ops: product vision, all runtime components, trust boundaries, tokens/licensing, control/data flows, storage, APIs, security, scale, DevOps, and verification logic.
|
||||
@@ -30,28 +25,32 @@ It **absorbs** all content from `components.md` so you have a single, authoritat
|
||||
|
||||
### 1.1 Runtime inventory (first‑party)
|
||||
|
||||
| Service / Tool | Container image | Core role | Scale pattern |
|
||||
| ------------------------------- | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
|
||||
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports. | Stateless; N replicas behind LB. |
|
||||
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/Mach‑O, EntryTrace); emits per‑layer SBOMs and composes image SBOMs. | Horizontal; queue‑driven; sharded by layer digest. |
|
||||
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for build‑time SBOMs as OCI **referrers**. | CI‑side; ephemeral. |
|
||||
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLI‑orchestrated scanner container for post‑build scans. | Local/CI; ephemeral. |
|
||||
| **Concelier.WebService** | `stellaops/concelier-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
|
||||
| **Excititor.WebService** | `stellaops/excititor-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
|
||||
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usage‑gating); produces **policy digest**. | In‑process; cache per digest. |
|
||||
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
|
||||
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
|
||||
| **Authority** | `stellaops/authority` | On‑prem OIDC issuing **short‑lived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
|
||||
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
|
||||
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, runtime, reports. | Stateless. |
|
||||
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper. | Local/CI. |
|
||||
| Service / Tool | Container image | Core role | Scale pattern |
|
||||
| ------------------------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- |
|
||||
| **Scanner.WebService** | `stellaops/scanner-web` | Control plane for scans; catalog; SBOM composition (inventory & usage); diff; exports; **analysis‑only report runs** for Scheduler. | Stateless; N replicas behind LB. |
|
||||
| **Scanner.Worker** | `stellaops/scanner-worker` | Runs analyzers (OS, Lang: Java/Node/Python/Go/.NET/Rust, Native ELF/PE/Mach‑O, EntryTrace); emits per‑layer SBOMs and composes image SBOMs. | Horizontal; queue‑driven; sharded by layer digest. |
|
||||
| **Scanner.Sbomer.BuildXPlugin** | `stellaops/sbom-indexer` | BuildKit **generator** for build‑time SBOMs as OCI **referrers**. | CI‑side; ephemeral. |
|
||||
| **Scanner.Sbomer.DockerImage** | `stellaops/scanner-cli` | CLI‑orchestrated scanner container for post‑build scans. | Local/CI; ephemeral. |
|
||||
| **Concelier.WebService** | `stellaops/concelier-web` | Vulnerability ingest/normalize/merge/export (JSON + Trivy DB). | HA via Mongo locks. |
|
||||
| **Excititor.WebService** | `stellaops/excititor-web` | VEX ingest/normalize/consensus; conflict retention; exports. | HA via Mongo locks. |
|
||||
| **Policy Engine** | (in `scanner-web`) | YAML DSL evaluator (waivers, vendor preferences, KEV/EPSS, license, usage‑gating); produces **policy digest**. | In‑process; cache per digest. |
|
||||
| **Scheduler.WebService** | `stellaops/scheduler-web` | Schedules **re‑evaluation** runs; consumes Concelier/Excititor deltas; selects **impacted images** via BOM‑Index; orchestrates analysis‑only reports. | Stateless API. |
|
||||
| **Scheduler.Worker** | `stellaops/scheduler-worker` | Executes selection and enqueues batches toward Scanner; enforces rate/limits and windows; maintains impact cursors. | Horizontal; queue‑driven. |
|
||||
| **Notify.WebService** | `stellaops/notify-web` | Rules engine for outbound notifications; manages channels, templates, throttle/digest logic. | Stateless API. |
|
||||
| **Notify.Worker** | `stellaops/notify-worker` | Delivers to Slack/Teams/Email/Webhooks; idempotent retries; digests. | Horizontal; per‑channel rate limits. |
|
||||
| **Signer** | `stellaops/signer` | **Hard gate:** validates entitlement + release integrity; mints signing cert (Fulcio keyless) or uses KMS; signs DSSE. | Stateless; HPA by QPS. |
|
||||
| **Attestor** | `stellaops/attestor` | Posts DSSE bundles to **Rekor v2**; verification endpoints. | Stateless; HPA by QPS. |
|
||||
| **Authority** | `stellaops/authority` | On‑prem OIDC issuing **short‑lived OpToks** with DPoP/mTLS sender constraint. | HA behind LB. |
|
||||
| **Zastava** (Runtime) | `stellaops/zastava` | Runtime inspector/enforcer (observer + optional Admission Webhook). | DaemonSet + Webhook. |
|
||||
| **Web UI** | `stellaops/ui` | Angular app for scans, diffs, policy, VEX, **Scheduler**, **Notify**, runtime, reports. | Stateless. |
|
||||
| **StellaOps.Cli** | `stellaops/cli` | CLI for init/scan/export/diff/policy/report/verify; Buildx helper; **schedule** and **notify** verbs. | Local/CI. |
|
||||
|
||||
### 1.2 Third‑party (self‑hosted)
|
||||
|
||||
* **Fulcio** (Sigstore CA) — issues short‑lived signing certs (keyless).
|
||||
* **Rekor v2** (tile‑backed transparency log).
|
||||
* **MinIO** — S3‑compatible object store with lifecycle & Object Lock.
|
||||
* **MongoDB** — catalog, advisories, VEX.
|
||||
* **MongoDB** — catalog, advisories, VEX, scheduler, notify.
|
||||
* **Queue** — Redis Streams / NATS / RabbitMQ (pluggable).
|
||||
* **OCI Registry** — must support **Referrers API** (discover SBOMs/signatures).
|
||||
|
||||
@@ -71,8 +70,12 @@ flowchart LR
|
||||
Auth[Authority (OIDC)\nOpTok (DPoP/mTLS)]
|
||||
SW[Scanner.WebService]
|
||||
WK[Scanner.Worker xN]
|
||||
FEED[Concelier]
|
||||
VEX[Excititor]
|
||||
CONC[Concelier]
|
||||
EXC[Excititor]
|
||||
SCHW[Scheduler.Web]
|
||||
SCH[Scheduler.Worker xN]
|
||||
NOTW[Notify.Web]
|
||||
NOT[Notify.Worker xN]
|
||||
POL[Policy Engine (in Scanner.Web)]
|
||||
SGN[Signer\n(entitlement + signing)]
|
||||
ATT[Attestor\n(Rekor v2 submit/verify)]
|
||||
@@ -93,11 +96,19 @@ flowchart LR
|
||||
QUE --> WK
|
||||
WK --> MIN
|
||||
SW --> MGO
|
||||
FEED --> MGO
|
||||
VEX --> MGO
|
||||
CONC --> MGO
|
||||
EXC --> MGO
|
||||
UI --> SW
|
||||
Z --> SW
|
||||
|
||||
%% New event-driven loop
|
||||
CONC -- export.delta --> SCHW
|
||||
EXC -- export.delta --> SCHW
|
||||
SCHW --> SCH
|
||||
SCH --> SW
|
||||
SW -- report.ready --> NOTW
|
||||
Z -- admission/observe --> NOTW
|
||||
|
||||
SGN <--> Auth
|
||||
SGN --> FUL
|
||||
SGN -->|mTLS| ATT
|
||||
@@ -106,7 +117,7 @@ flowchart LR
|
||||
SGN <-->|verify referrers| REG
|
||||
```
|
||||
|
||||
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI never sign.
|
||||
**Trust boundaries.** Only **Signer** can sign; only **Attestor** can write to **Rekor v2**. Scanner/UI/Scheduler/Notify never sign.
|
||||
|
||||
---
|
||||
|
||||
@@ -116,7 +127,7 @@ flowchart LR
|
||||
|
||||
* **License Token (LT)** — long‑lived JWT from **Licensing Service**; used **once** to enroll the installation; never used in hot path.
|
||||
* **Proof‑of‑Entitlement (PoE)** — bound to the installation key (mTLS client cert **or** DPoP‑bound JWT with `cnf`); medium‑lived; renewable; revocable.
|
||||
* **Operational token (OpTok)** — 2–5 min OIDC token from **Authority**, **sender‑constrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**.
|
||||
* **Operational token (OpTok)** — 2–5 min OIDC token from **Authority**, **sender‑constrained** (DPoP or mTLS). Used to authenticate to **Signer**/**Scanner.WebService**/**Scheduler.Web**/**Notify.Web**.
|
||||
|
||||
**Signer enforces both:** PoE proves entitlement; OpTok proves “who is calling now”. It also **independently verifies** the **scanner image digest** is **Stella Ops‑signed** via **Referrers + cosign** before signing anything.
|
||||
|
||||
@@ -173,6 +184,11 @@ LS --> IA: PoE (mTLS client cert or JWT with cnf=K_inst), CRL/OCSP/introspect
|
||||
* Buildx **generator** runs analyzers during `docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer`, attaches SBOMs as **OCI referrers**.
|
||||
* Scanner.WebService can trust these (policy‑configurable) and **skip** re‑scan; DSSE + Rekor v2 can be done either at build time or post‑push via Signer/Attestor.
|
||||
|
||||
### 3.5 Events / integrations
|
||||
|
||||
* **Out:** `report.ready` (summary + verdict + Rekor UUID) → internal bus for **Notify** & UI.
|
||||
* **Expose:** image‑level **BOM‑Index** metadata for **Scheduler** impact selection.
|
||||
|
||||
---
|
||||
|
||||
## 4) Backend evaluation (decider)
|
||||
@@ -227,6 +243,8 @@ s3://stellaops/
|
||||
|
||||
* `artifacts` (type/format/sha/size/rekor/ttl/immutable/refCount/createdAt)
|
||||
* `images`, `layers`, `links`, `lifecycleRules`
|
||||
* **Scheduler:** `schedules`, `runs`, `locks`, `impact_cursors`
|
||||
* **Notify:** `rules`, `deliveries`, `channels`, `templates`
|
||||
|
||||
**Retention**
|
||||
|
||||
@@ -239,13 +257,13 @@ s3://stellaops/
|
||||
### 7.1 Scanner.WebService
|
||||
|
||||
```
|
||||
POST /api/scans { imageRef|digest, force? } → { scanId }
|
||||
GET /api/scans/{id} → { status, digests, artifacts[] }
|
||||
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
|
||||
POST /api/scans { imageRef|digest, force? } → { scanId }
|
||||
GET /api/scans/{id} → { status, digests, artifacts[] }
|
||||
GET /api/sboms/{imageDigest} ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage
|
||||
GET /api/diff?old=<digest>&new=<digest> → { added[], removed[], changed[], byLayer[] }
|
||||
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
|
||||
POST /api/reports { imageDigest, policyRevision? } → { reportId, rekorUrl }
|
||||
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
|
||||
POST /api/exports { imageDigest, format, view } → { artifactId, rekorUrl }
|
||||
POST /api/reports { imageDigest, policyRevision?, vexSnapshot? } → { reportId, verdict, rekorUrl }
|
||||
GET /api/catalog/artifacts/{id} → { size, ttl, immutable, rekor, refs }
|
||||
GET /healthz | /readyz | /metrics
|
||||
```
|
||||
|
||||
@@ -276,6 +294,25 @@ POST /license/introspect { poe } → { active, claims, exp }
|
||||
POST /attest/endorse { bundle } → endorsement bundle (optional)
|
||||
```
|
||||
|
||||
### 7.6 Scheduler
|
||||
|
||||
```
|
||||
POST /api/v1/scheduler/schedules {yaml|json} → { scheduleId }
|
||||
GET /api/v1/scheduler/schedules → [ { id, nextRun, status, stats } ]
|
||||
POST /api/v1/scheduler/run { id|selector } → { runId }
|
||||
GET /api/v1/scheduler/runs/{id} → { status, counts, links }
|
||||
GET /api/v1/scheduler/cursor → { lastConcelierExportId, lastExcititorExportId }
|
||||
```
|
||||
|
||||
### 7.7 Notify
|
||||
|
||||
```
|
||||
POST /api/v1/notify/test { channel, target } → { delivered }
|
||||
POST /api/v1/notify/rules {yaml|json} → { ruleId }
|
||||
GET /api/v1/notify/rules → [ { id, match, actions, enabled } ]
|
||||
GET /api/v1/notify/deliveries → [ { id, eventId, channel, status, attempts } ]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8) Security & verifiability
|
||||
@@ -283,8 +320,9 @@ POST /attest/endorse { bundle } → endorsement bundle (optio
|
||||
* **Sender‑constrained tokens.** All operational calls use **DPoP** (RFC 9449) or **mTLS‑bound** tokens (RFC 8705).
|
||||
* **Entitlement.** **PoE** is mandatory; revocation honored online.
|
||||
* **Release integrity.** **Signer** independently verifies **scanner image digest** via **Referrers + cosign** before signing.
|
||||
* **Separation of duties.** Scanner/UI cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
|
||||
* **Separation of duties.** Scanner/UI/Scheduler/Notify cannot sign; only **Signer** can sign; only **Attestor** can write to **Rekor v2**.
|
||||
* **Verifiers.** Anyone can verify: DSSE signature → certificate chain to **Stella Ops Fulcio/KMS root** → **Rekor v2** inclusion.
|
||||
* **RBAC.** Roles: `scanner.admin|read`, `scheduler.admin|read`, `notify.admin|read`, `zastava.admin|read`.
|
||||
* **Community vs Authorized.** Free/community runs throttled with no official attestations; authorized runs full speed and produce **Stella Ops‑verified** bundles.
|
||||
|
||||
**DSSE predicate (SBOM/report)**
|
||||
@@ -321,6 +359,8 @@ Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags
|
||||
* Build‑time path P95 ≤ 3–5 s on warmed bases.
|
||||
* Post‑build delta scan P95 ≤ 10 s for 200 MB images.
|
||||
* Policy + VEX evaluation ≤ 500 ms for 5k components using BOM‑Index.
|
||||
* **Event → notification** p95 ≤ **30–60 s** under nominal load.
|
||||
* **Export delta → re‑evaluation verdict** p95 ≤ **5 min** for 10k impacted images.
|
||||
* **Quotas:** license plan enforces QPS/concurrency/size; **Signer** throttles and can deny DSSE.
|
||||
|
||||
---
|
||||
@@ -337,32 +377,37 @@ Binary header + purl table + roaring bitmaps; optional `usedByEntrypoint` flags
|
||||
|
||||
```yaml
|
||||
services:
|
||||
authority: { image: stellaops/authority }
|
||||
fulcio: { image: sigstore/fulcio }
|
||||
rekor: { image: sigstore/rekor-v2 }
|
||||
minio: { image: minio/minio, command: server /data --console-address ":9001" }
|
||||
mongo: { image: mongo:7 }
|
||||
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
|
||||
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
|
||||
scanner-web:{ image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
|
||||
scanner-worker:
|
||||
image: stellaops/scanner-worker
|
||||
deploy: { replicas: 4 }
|
||||
depends_on: [scanner-web]
|
||||
concelier: { image: stellaops/concelier-web, depends_on: [mongo] }
|
||||
excititor: { image: stellaops/excititor-web, depends_on: [mongo] }
|
||||
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor] }
|
||||
authority: { image: stellaops/authority }
|
||||
fulcio: { image: sigstore/fulcio }
|
||||
rekor: { image: sigstore/rekor-v2 }
|
||||
minio: { image: minio/minio, command: server /data --console-address ":9001" }
|
||||
mongo: { image: mongo:7 }
|
||||
signer: { image: stellaops/signer, depends_on: [authority, fulcio] }
|
||||
attestor: { image: stellaops/attestor, depends_on: [rekor, signer] }
|
||||
scanner-web: { image: stellaops/scanner-web, depends_on: [mongo, minio, signer, attestor] }
|
||||
scanner-worker: { image: stellaops/scanner-worker, deploy: { replicas: 4 }, depends_on: [scanner-web] }
|
||||
concelier: { image: stellaops/concelier-web, depends_on: [mongo] }
|
||||
excititor: { image: stellaops/excititor-web, depends_on: [mongo] }
|
||||
scheduler-web: { image: stellaops/scheduler-web, depends_on: [mongo] }
|
||||
scheduler-worker:{ image: stellaops/scheduler-worker, deploy: { replicas: 2 }, depends_on: [scheduler-web] }
|
||||
notify-web: { image: stellaops/notify-web, depends_on: [mongo] }
|
||||
notify-worker: { image: stellaops/notify-worker, deploy: { replicas: 2 }, depends_on: [notify-web] }
|
||||
ui: { image: stellaops/ui, depends_on: [scanner-web, concelier, excititor, scheduler-web, notify-web] }
|
||||
```
|
||||
|
||||
* **Backups:** Mongo dumps; MinIO versioned buckets & replication; Rekor v2 DB snapshots; JWKS/Fulcio/KMS key rotation.
|
||||
* **Ops runbooks:** Scheduler catch‑up after Concelier/Excititor recovery; connector key rotation (Slack/Teams/SMTP).
|
||||
* **SLOs & alerts:** lag between Concelier/Excititor export and first rescan verdict; delivery failure rates by channel.
|
||||
|
||||
---
|
||||
|
||||
## 11) Observability & audit
|
||||
|
||||
* **Metrics:** scan latency, layer cache hit %, artifact bytes, DSSE/Rekor latency, policy evaluation time, queue depth, admission decisions (Zastava).
|
||||
* **Tracing:** per‑stage spans; correlation IDs across Scanner→Signer→Attestor.
|
||||
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID.
|
||||
* **Scheduler metrics:** `scheduler.impacted_images_total`, `scheduler.jobs_enqueued_total`, `scheduler.selection_ms`, end‑to‑end p95 (event → verdict).
|
||||
* **Notify metrics:** `notify.sent_total{channel}`, `notify.dropped_total{reason}`, `notify.digest_coalesced_total`, `notify.latency_ms`.
|
||||
* **Tracing:** per‑stage spans; correlation IDs across Scanner→Signer→Attestor and Concelier/Excititor→Scheduler→Scanner→Notify.
|
||||
* **Audit logs:** every signing records `license_id`, `image_digest`, `policy_digest`, and Rekor UUID; Scheduler records who scheduled what; Notify records where, when, and why messages were sent or deduped.
|
||||
* **Compliance:** MinIO **Object Lock** for immutable artifacts; reproducible outputs via policy digest + SBOM digest in predicate.
|
||||
|
||||
---
|
||||
@@ -373,11 +418,13 @@ services:
|
||||
* M2: Buildx generator certified flows; cross‑registry trust policies.
|
||||
* M3: Patch‑Presence plugin (signature‑based backport detection), opt‑in.
|
||||
* M3: Zastava Admission control GA with policy presets and dry‑run→enforce stages.
|
||||
* M3: **Scheduler GA** with export‑delta impact routing and capacity‑aware pacing.
|
||||
* M3: **Notify GA** with digests, Slack/Teams/Email/Webhooks; **M4:** PagerDuty/Opsgenie connectors.
|
||||
* Continuous: Policy UX (waiver TTLs, vendor rules), Excititor connectors expansion.
|
||||
|
||||
---
|
||||
|
||||
## 13) Canonical sequences (verification & signing)
|
||||
## 13) Canonical sequences (verification, re‑evaluation & notify)
|
||||
|
||||
**Sign & log (OpTok + PoE, image verify, DSSE, Rekor).**
|
||||
|
||||
@@ -409,22 +456,62 @@ sequenceDiagram
|
||||
end
|
||||
```
|
||||
|
||||
**Verification (third party).**
|
||||
**Event‑driven re‑evaluation & notify.**
|
||||
|
||||
```plantuml
|
||||
@startuml
|
||||
actor Verifier
|
||||
participant "stellaops verify" as Tool
|
||||
database "Fulcio/KMS root" as Root
|
||||
participant "Rekor v2" as R2
|
||||
Verifier -> Tool: bundle (URL/file)
|
||||
Tool -> Tool: Verify DSSE signature
|
||||
Tool -> Root: Verify cert chain to StellaOps root
|
||||
Tool -> R2: Verify inclusion proof / query by UUID
|
||||
Tool -> Verifier: OK + claims (license_id, policy_digest, version)
|
||||
@enduml
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CONC as Concelier
|
||||
participant EXC as Excititor
|
||||
participant SCH as Scheduler
|
||||
participant SC as Scanner.WebService
|
||||
participant NO as Notify
|
||||
|
||||
CONC->>SCH: export.delta {changedProductKeys, exportId}
|
||||
EXC ->>SCH: export.delta {changedProductKeys, exportId}
|
||||
SCH->>SCH: Impact select via BOM-Index bitmaps
|
||||
SCH->>SC: Enqueue analysis-only reports (batches)
|
||||
SC-->>SCH: verdict stream (PASS/FAIL, deltas)
|
||||
SCH->>NO: rescan.delta {imageDigest, newCriticals, links}
|
||||
NO-->>Slack/Teams/Email/Webhook: deliver (throttle/digest rules applied)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**End of `high_level_architecture.md` (Consolidated).**
|
||||
## 14) Minimal data shapes (Scheduler & Notify)
|
||||
|
||||
**Scheduler schedule (YAML via UI/CLI)**
|
||||
|
||||
```yaml
|
||||
name: nightly-eu
|
||||
when: "0 2 * * * Europe/Sofia"
|
||||
mode: analysis-only # or content-refresh
|
||||
selection:
|
||||
scope: all-images # or tenant/ns/repo label selectors
|
||||
onlyIf: { lastReportOlderThanDays: 7 }
|
||||
notify:
|
||||
onNewFindings: true
|
||||
minSeverity: high
|
||||
limits:
|
||||
maxJobs: 5000
|
||||
ratePerSecond: 50
|
||||
```
|
||||
|
||||
**Notify rule (YAML)**
|
||||
|
||||
```yaml
|
||||
name: high-critical-alerts
|
||||
match:
|
||||
eventKinds: ["report.ready","rescan.delta","zastava.admission"]
|
||||
minSeverity: high
|
||||
namespaces: ["prod-*"]
|
||||
vex: { includeAcceptedJustifications: false }
|
||||
actions:
|
||||
- channel: slack
|
||||
target: "#sec-alerts"
|
||||
template: "concise"
|
||||
throttle: "5m"
|
||||
- channel: email
|
||||
target: "soc@acme.org"
|
||||
digest: "hourly"
|
||||
enabled: true
|
||||
```
|
||||
|
||||
@@ -37,6 +37,8 @@ src/
|
||||
|
||||
**Language/runtime**: .NET 10 **Native AOT** for speed/startup; Linux builds use **musl** static when possible.
|
||||
|
||||
**Plug-in verbs.** Non-core verbs (Excititor, runtime helpers, future integrations) ship as restart-time plug-ins under `plugins/cli/**` with manifest descriptors. The launcher loads plug-ins on startup; hot reloading is intentionally unsupported.
|
||||
|
||||
**OS targets**: linux‑x64/arm64, windows‑x64/arm64, macOS‑x64/arm64.
|
||||
|
||||
---
|
||||
@@ -386,4 +388,3 @@ script:
|
||||
* macOS: 13–15 (x64, arm64).
|
||||
* Windows: 10/11, Server 2019/2022 (x64, arm64).
|
||||
* Docker engines: Docker Desktop, containerd‑based runners.
|
||||
|
||||
|
||||
456
docs/ARCHITECTURE_NOTIFY.md
Normal file
456
docs/ARCHITECTURE_NOTIFY.md
Normal file
@@ -0,0 +1,456 @@
|
||||
> **Scope.** Implementation‑ready architecture for **Notify**: a rules‑driven, tenant‑aware notification service that consumes platform events (scan completed, report ready, rescan deltas, attestation logged, admission decisions, etc.), evaluates operator‑defined routing rules, renders **channel‑specific messages** (Slack/Teams/Email/Webhook), and delivers them **reliably** with idempotency, throttling, and digests. It is UI‑managed, auditable, and safe by default (no secrets leakage, no spam storms).
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & boundaries
|
||||
|
||||
**Mission.** Convert **facts** from Stella Ops into **actionable, noise‑controlled** signals where teams already live (chat/email/webhooks), with **explainable** reasons and deep links to the UI.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Notify **does not make policy decisions** and **does not rescan**; it **consumes** events from Scanner/Scheduler/Vexer/Feedser/Attestor/Zastava and routes them.
|
||||
* Attachments are **links** (UI/attestation pages); Notify **does not** attach SBOMs or large blobs to messages.
|
||||
* Secrets for channels (Slack tokens, SMTP creds) are **referenced**, not stored raw in Mongo.
|
||||
|
||||
---
|
||||
|
||||
## 1) Runtime shape & projects
|
||||
|
||||
```
|
||||
src/
|
||||
├─ StellaOps.Notify.WebService/ # REST: rules/channels CRUD, test send, deliveries browse
|
||||
├─ StellaOps.Notify.Worker/ # consumers + evaluators + renderers + delivery workers
|
||||
├─ StellaOps.Notify.Connectors.* / # channel plug-ins: Slack, Teams, Email, Webhook (v1)
|
||||
│ └─ *.Tests/
|
||||
├─ StellaOps.Notify.Engine/ # rules engine, templates, idempotency, digests, throttles
|
||||
├─ StellaOps.Notify.Models/ # DTOs (Rule, Channel, Event, Delivery, Template)
|
||||
├─ StellaOps.Notify.Storage.Mongo/ # rules, channels, deliveries, digests, locks
|
||||
├─ StellaOps.Notify.Queue/ # bus client (Redis Streams/NATS JetStream)
|
||||
└─ StellaOps.Notify.Tests.* # unit/integration/e2e
|
||||
```
|
||||
|
||||
**Deployables**:
|
||||
|
||||
* **Notify.WebService** (stateless API)
|
||||
* **Notify.Worker** (horizontal scale)
|
||||
|
||||
**Dependencies**: Authority (OpToks; DPoP/mTLS), MongoDB, Redis/NATS (bus), HTTP egress to Slack/Teams/Webhooks, SMTP relay for Email.
|
||||
|
||||
---
|
||||
|
||||
## 2) Responsibilities
|
||||
|
||||
1. **Ingest** platform events from internal bus with strong ordering per key (e.g., image digest).
|
||||
2. **Evaluate rules** (tenant‑scoped) with matchers: severity changes, namespaces, repos, labels, KEV flags, provider provenance (VEX), component keys, admission decisions, etc.
|
||||
3. **Control noise**: **throttle**, **coalesce** (digest windows), and **dedupe** via idempotency keys.
|
||||
4. **Render** channel‑specific messages using safe templates; include **evidence** and **links**.
|
||||
5. **Deliver** with retries/backoff; record outcome; expose delivery history to UI.
|
||||
6. **Test** paths (send test to channel targets) without touching live rules.
|
||||
7. **Audit**: log who configured what, when, and why a message was sent.
|
||||
|
||||
---
|
||||
|
||||
## 3) Event model (inputs)
|
||||
|
||||
Notify subscribes to the **internal event bus** (produced by services, escaped JSON; gzip allowed with caps):
|
||||
|
||||
* `scanner.scan.completed` — new SBOM(s) composed; artifacts ready
|
||||
* `scanner.report.ready` — analysis verdict (policy+vex) available; carries deltas summary
|
||||
* `scheduler.rescan.delta` — new findings after Feedser/Vexer deltas (already summarized)
|
||||
* `attestor.logged` — Rekor UUID returned (sbom/report/vex export)
|
||||
* `zastava.admission` — admit/deny with reasons, namespace, image digests
|
||||
* `feedser.export.completed` — new export ready (rarely notified directly; usually drives Scheduler)
|
||||
* `vexer.export.completed` — new consensus snapshot (ditto)
|
||||
|
||||
**Canonical envelope (bus → Notify.Engine):**
|
||||
|
||||
```json
|
||||
{
|
||||
"eventId": "uuid",
|
||||
"kind": "scanner.report.ready",
|
||||
"tenant": "tenant-01",
|
||||
"ts": "2025-10-18T05:41:22Z",
|
||||
"actor": "scanner-webservice",
|
||||
"scope": { "namespace":"payments", "repo":"ghcr.io/acme/api", "digest":"sha256:..." },
|
||||
"payload": { /* kind-specific fields, see below */ }
|
||||
}
|
||||
```
|
||||
|
||||
**Examples (payload cores):**
|
||||
|
||||
* `scanner.report.ready`:
|
||||
|
||||
```json
|
||||
{ "verdict":"fail|warn|pass",
|
||||
"delta": { "newCritical":1, "newHigh":2, "kev":["CVE-2025-..."] },
|
||||
"topFindings":[{"purl":"pkg:rpm/openssl","vulnId":"CVE-2025-...","severity":"critical"}],
|
||||
"links":{"ui":"https://ui/...","rekor":"https://rekor/..."} }
|
||||
```
|
||||
|
||||
* `zastava.admission`:
|
||||
|
||||
```json
|
||||
{ "decision":"deny|allow", "reasons":["unsigned image","missing SBOM"],
|
||||
"images":[{"digest":"sha256:...","signed":false,"hasSbom":false}] }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4) Rules engine — semantics
|
||||
|
||||
**Rule shape (simplified):**
|
||||
|
||||
```yaml
|
||||
name: "high-critical-alerts-prod"
|
||||
enabled: true
|
||||
match:
|
||||
eventKinds: ["scanner.report.ready","scheduler.rescan.delta","zastava.admission"]
|
||||
namespaces: ["prod-*"]
|
||||
repos: ["ghcr.io/acme/*"]
|
||||
minSeverity: "high" # min of new findings (delta context)
|
||||
kev: true # require KEV-tagged or allow any if false
|
||||
verdict: ["fail","deny"] # filter for report/admission
|
||||
vex:
|
||||
includeRejectedJustifications: false # notify only on accepted 'affected'
|
||||
actions:
|
||||
- channel: "slack:sec-alerts" # reference to Channel object
|
||||
template: "concise"
|
||||
throttle: "5m"
|
||||
- channel: "email:soc"
|
||||
digest: "hourly"
|
||||
template: "detailed"
|
||||
```
|
||||
|
||||
**Evaluation order**
|
||||
|
||||
1. **Tenant check** → discard if rule tenant ≠ event tenant.
|
||||
2. **Kind filter** → discard early.
|
||||
3. **Scope match** (namespace/repo/labels).
|
||||
4. **Delta/severity gates** (if event carries `delta`).
|
||||
5. **VEX gate** (drop if event’s finding is not affected under policy consensus unless rule says otherwise).
|
||||
6. **Throttling/dedup** (idempotency key) — skip if suppressed.
|
||||
7. **Actions** → enqueue per‑channel job(s).
|
||||
|
||||
**Idempotency key**: `hash(ruleId | actionId | event.kind | scope.digest | delta.hash | day-bucket)`; ensures “same alert” doesn’t fire more than once within throttle window.
|
||||
|
||||
**Digest windows**: maintain per action a **coalescer**:
|
||||
|
||||
* Window: `5m|15m|1h|1d` (configurable); coalesces events by tenant + namespace/repo or by digest group.
|
||||
* Digest messages summarize top N items and counts, with safe truncation.
|
||||
|
||||
---
|
||||
|
||||
## 5) Channels & connectors (plug‑ins)
|
||||
|
||||
Channel config is **two‑part**: a **Channel** record (name, type, options) and a Secret **reference** (Vault/K8s Secret). Connectors are **restart-time plug-ins** discovered on service start (same manifest convention as Concelier/Excititor) and live under `plugins/notify/<channel>/`.
|
||||
|
||||
**Built‑in v1:**
|
||||
|
||||
* **Slack**: Bot token (xoxb‑…), `chat.postMessage` + `blocks`; rate limit aware (HTTP 429).
|
||||
* **Microsoft Teams**: Incoming Webhook (or Graph card later); adaptive card payloads.
|
||||
* **Email (SMTP)**: TLS (STARTTLS or implicit), From/To/CC/BCC; HTML+text alt; DKIM optional.
|
||||
* **Generic Webhook**: POST JSON with HMAC signature (Ed25519 or SHA‑256) in headers.
|
||||
|
||||
**Connector contract:** (implemented by plug-in assemblies)
|
||||
|
||||
```csharp
|
||||
public interface INotifyConnector {
|
||||
string Type { get; } // "slack" | "teams" | "email" | "webhook" | ...
|
||||
Task<DeliveryResult> SendAsync(DeliveryContext ctx, CancellationToken ct);
|
||||
Task<HealthResult> HealthAsync(ChannelConfig cfg, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**DeliveryContext** includes **rendered content** and **raw event** for audit.
|
||||
|
||||
**Secrets**: `ChannelConfig.secretRef` points to Authority‑managed secret handle or K8s Secret path; workers load at send-time; plug-in manifests (`notify-plugin.json`) declare capabilities and version.
|
||||
|
||||
---
|
||||
|
||||
## 6) Templates & rendering
|
||||
|
||||
**Template engine**: strongly typed, safe Handlebars‑style; no arbitrary code. Partial templates per channel. Deterministic outputs (prop order, no locale drift unless requested).
|
||||
|
||||
**Variables** (examples):
|
||||
|
||||
* `event.kind`, `event.ts`, `scope.namespace`, `scope.repo`, `scope.digest`
|
||||
* `payload.verdict`, `payload.delta.newCritical`, `payload.links.ui`, `payload.links.rekor`
|
||||
* `topFindings[]` with `purl`, `vulnId`, `severity`
|
||||
* `policy.name`, `policy.revision` (if available)
|
||||
|
||||
**Helpers**:
|
||||
|
||||
* `severity_icon(sev)`, `link(text,url)`, `pluralize(n, "finding")`, `truncate(text, n)`, `code(text)`.
|
||||
|
||||
**Channel mapping**:
|
||||
|
||||
* Slack: title + blocks, limited to 50 blocks/3000 chars per section; long lists → link to UI.
|
||||
* Teams: Adaptive Card schema 1.5; fallback text for older channels.
|
||||
* Email: HTML + text; inline table of top N findings, rest behind UI link.
|
||||
* Webhook: JSON with `event`, `ruleId`, `actionId`, `summary`, `links`, and raw `payload` subset.
|
||||
|
||||
**i18n**: template set per locale (English default; Bulgarian built‑in).
|
||||
|
||||
---
|
||||
|
||||
## 7) Data model (Mongo)
|
||||
|
||||
**Database**: `notify`
|
||||
|
||||
* `rules`
|
||||
|
||||
```
|
||||
{ _id, tenantId, name, enabled, match, actions, createdBy, updatedBy, createdAt, updatedAt }
|
||||
```
|
||||
|
||||
* `channels`
|
||||
|
||||
```
|
||||
{ _id, tenantId, name:"slack:sec-alerts", type:"slack",
|
||||
config:{ webhookUrl?:"", channel:"#sec-alerts", workspace?: "...", secretRef:"ref://..." },
|
||||
createdAt, updatedAt }
|
||||
```
|
||||
|
||||
* `deliveries`
|
||||
|
||||
```
|
||||
{ _id, tenantId, ruleId, actionId, eventId, kind, scope, status:"sent|failed|throttled|digested|dropped",
|
||||
attempts:[{ts, status, code, reason}],
|
||||
rendered:{ title, body, target }, // redacted for PII; body hash stored
|
||||
sentAt, lastError? }
|
||||
```
|
||||
|
||||
* `digests`
|
||||
|
||||
```
|
||||
{ _id, tenantId, actionKey, window:"hourly", openedAt, items:[{eventId, scope, delta}], status:"open|flushed" }
|
||||
```
|
||||
|
||||
* `throttles`
|
||||
|
||||
```
|
||||
{ key:"idem:<hash>", ttlAt } // short-lived, also cached in Redis
|
||||
```
|
||||
|
||||
**Indexes**: rules by `{tenantId, enabled}`, deliveries by `{tenantId, sentAt desc}`, digests by `{tenantId, actionKey}`.
|
||||
|
||||
---
|
||||
|
||||
## 8) External APIs (WebService)
|
||||
|
||||
Base path: `/api/v1/notify` (Authority OpToks; scopes: `notify.admin` for write, `notify.read` for view).
|
||||
|
||||
* **Channels**
|
||||
|
||||
* `POST /channels` | `GET /channels` | `GET /channels/{id}` | `PATCH /channels/{id}` | `DELETE /channels/{id}`
|
||||
* `POST /channels/{id}/test` → send sample message (no rule evaluation)
|
||||
* `GET /channels/{id}/health` → connector self‑check
|
||||
|
||||
* **Rules**
|
||||
|
||||
* `POST /rules` | `GET /rules` | `GET /rules/{id}` | `PATCH /rules/{id}` | `DELETE /rules/{id}`
|
||||
* `POST /rules/{id}/test` → dry‑run rule against a **sample event** (no delivery unless `--send`)
|
||||
|
||||
* **Deliveries**
|
||||
|
||||
* `GET /deliveries?tenant=...&since=...` → list
|
||||
* `GET /deliveries/{id}` → detail (redacted body + metadata)
|
||||
* `POST /deliveries/{id}/retry` → force retry (admin)
|
||||
|
||||
* **Admin**
|
||||
|
||||
* `GET /stats` (per tenant counts, last hour/day)
|
||||
* `GET /healthz|readyz` (liveness)
|
||||
|
||||
**Ingestion**: workers do **not** expose public ingestion; they **subscribe** to the internal bus. (Optional `/events/test` for integration testing, admin‑only.)
|
||||
|
||||
---
|
||||
|
||||
## 9) Delivery pipeline (worker)
|
||||
|
||||
```
|
||||
[Event bus] → [Ingestor] → [RuleMatcher] → [Throttle/Dedupe] → [DigestCoalescer] → [Renderer] → [Connector] → [Result]
|
||||
└────────→ [DeliveryStore]
|
||||
```
|
||||
|
||||
* **Ingestor**: N consumers with per‑key ordering (key = tenant|digest|namespace).
|
||||
* **RuleMatcher**: loads active rules snapshot for tenant into memory; vectorized predicate check.
|
||||
* **Throttle/Dedupe**: consult Redis + Mongo `throttles`; if hit → record `status=throttled`.
|
||||
* **DigestCoalescer**: append to open digest window or flush when timer expires.
|
||||
* **Renderer**: select template (channel+locale), inject variables, enforce length limits, compute `bodyHash`.
|
||||
* **Connector**: send; handle provider‑specific rate limits and backoffs; `maxAttempts` with exponential jitter; overflow → DLQ (dead‑letter topic) + UI surfacing.
|
||||
|
||||
**Idempotency**: per action **idempotency key** stored in Redis (TTL = `throttle window` or `digest window`). Connectors also respect **provider** idempotency where available (e.g., Slack `client_msg_id`).
|
||||
|
||||
---
|
||||
|
||||
## 10) Reliability & rate controls
|
||||
|
||||
* **Per‑tenant** RPM caps (default 600/min) + **per‑channel** concurrency (Slack 1–4, Teams 1–2, Email 8–32 based on relay).
|
||||
* **Backoff** map: Slack 429 → respect `Retry‑After`; SMTP 4xx → retry; 5xx → retry with jitter; permanent rejects → drop with status recorded.
|
||||
* **DLQ**: NATS/Redis stream `notify.dlq` with `{event, rule, action, error}` for operator inspection; UI shows DLQ items.
|
||||
|
||||
---
|
||||
|
||||
## 11) Security & privacy
|
||||
|
||||
* **AuthZ**: all APIs require **Authority** OpToks; actions scoped by tenant.
|
||||
* **Secrets**: `secretRef` only; Notify fetches just‑in‑time from Authority Secret proxy or K8s Secret (mounted). No plaintext secrets in Mongo.
|
||||
* **Egress TLS**: validate SSL; pin domains per channel config; optional CA bundle override for on‑prem SMTP.
|
||||
* **Webhook signing**: HMAC or Ed25519 signatures in `X-StellaOps-Signature` + replay‑window timestamp; include canonical body hash in header.
|
||||
* **Redaction**: deliveries store **hashes** of bodies, not full payloads for chat/email to minimize PII retention (configurable).
|
||||
* **Quiet hours**: per tenant (e.g., 22:00–06:00) route high‑sev only; defer others to digests.
|
||||
* **Loop prevention**: Webhook target allowlist + event origin tags; do not ingest own webhooks.
|
||||
|
||||
---
|
||||
|
||||
## 12) Observability (Prometheus + OTEL)
|
||||
|
||||
* `notify.events_consumed_total{kind}`
|
||||
* `notify.rules_matched_total{ruleId}`
|
||||
* `notify.throttled_total{reason}`
|
||||
* `notify.digest_coalesced_total{window}`
|
||||
* `notify.sent_total{channel}` / `notify.failed_total{channel,code}`
|
||||
* `notify.delivery_latency_seconds{channel}` (end‑to‑end)
|
||||
* **Tracing**: spans `ingest`, `match`, `render`, `send`; correlation id = `eventId`.
|
||||
|
||||
**SLO targets**
|
||||
|
||||
* Event→delivery p95 **≤ 30–60 s** under nominal load.
|
||||
* Failure rate p95 **< 0.5%** per hour (excluding provider outages).
|
||||
* Duplicate rate **≈ 0** (idempotency working).
|
||||
|
||||
---
|
||||
|
||||
## 13) Configuration (YAML)
|
||||
|
||||
```yaml
|
||||
notify:
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
require: "dpop" # or "mtls"
|
||||
bus:
|
||||
kind: "redis" # or "nats"
|
||||
streams:
|
||||
- "scanner.events"
|
||||
- "scheduler.events"
|
||||
- "attestor.events"
|
||||
- "zastava.events"
|
||||
mongo:
|
||||
uri: "mongodb://mongo/notify"
|
||||
limits:
|
||||
perTenantRpm: 600
|
||||
perChannel:
|
||||
slack: { concurrency: 2 }
|
||||
teams: { concurrency: 1 }
|
||||
email: { concurrency: 8 }
|
||||
webhook: { concurrency: 8 }
|
||||
digests:
|
||||
defaultWindow: "1h"
|
||||
maxItems: 100
|
||||
quietHours:
|
||||
enabled: true
|
||||
window: "22:00-06:00"
|
||||
minSeverity: "critical"
|
||||
webhooks:
|
||||
sign:
|
||||
method: "ed25519" # or "hmac-sha256"
|
||||
keyRef: "ref://notify/webhook-sign-key"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 14) UI touch‑points
|
||||
|
||||
* **Notifications → Channels**: add Slack/Teams/Email/Webhook; run **health**; rotate secrets.
|
||||
* **Notifications → Rules**: create/edit YAML rules with linting; test with sample events; see match rate.
|
||||
* **Notifications → Deliveries**: timeline with filters (status, channel, rule); inspect last error; retry.
|
||||
* **Digest preview**: shows current window contents and when it will flush.
|
||||
* **Quiet hours**: configure per tenant; show overrides.
|
||||
* **DLQ**: browse dead‑letters; requeue after fix.
|
||||
|
||||
---
|
||||
|
||||
## 15) Failure modes & responses
|
||||
|
||||
| Condition | Behavior |
|
||||
| ----------------------------------- | ------------------------------------------------------------------------------------- |
|
||||
| Slack 429 / Teams 429 | Respect `Retry‑After`, backoff with jitter, reduce concurrency |
|
||||
| SMTP transient 4xx | Retry up to `maxAttempts`; escalate to DLQ on exhaust |
|
||||
| Invalid channel secret | Mark channel unhealthy; suppress sends; surface in UI |
|
||||
| Rule explosion (matches everything) | Safety valve: per‑tenant RPM caps; auto‑pause rule after X drops; UI alert |
|
||||
| Bus outage | Buffer to local queue (bounded); resume consuming when healthy |
|
||||
| Mongo slowness | Fall back to Redis throttles; batch write deliveries; shed low‑priority notifications |
|
||||
|
||||
---
|
||||
|
||||
## 16) Testing matrix
|
||||
|
||||
* **Unit**: matchers, throttle math, digest coalescing, idempotency keys, template rendering edge cases.
|
||||
* **Connectors**: provider‑level rate limits, payload size truncation, error mapping.
|
||||
* **Integration**: synthetic event storm (10k/min), ensure p95 latency & duplicate rate.
|
||||
* **Security**: DPoP/mTLS on APIs; secretRef resolution; webhook signing & replay windows.
|
||||
* **i18n**: localized templates render deterministically.
|
||||
* **Chaos**: Slack/Teams API flaps; SMTP greylisting; Redis hiccups; ensure graceful degradation.
|
||||
|
||||
---
|
||||
|
||||
## 17) Sequences (representative)
|
||||
|
||||
**A) New criticals after Feedser delta (Slack immediate + Email hourly digest)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant SCH as Scheduler
|
||||
participant NO as Notify.Worker
|
||||
participant SL as Slack
|
||||
participant SMTP as Email
|
||||
|
||||
SCH->>NO: bus event scheduler.rescan.delta { newCritical:1, digest:sha256:... }
|
||||
NO->>NO: match rules (Slack immediate; Email hourly digest)
|
||||
NO->>SL: chat.postMessage (concise)
|
||||
SL-->>NO: 200 OK
|
||||
NO->>NO: append to digest window (email:soc)
|
||||
Note over NO: At window close → render digest email
|
||||
NO->>SMTP: send email (detailed digest)
|
||||
SMTP-->>NO: 250 OK
|
||||
```
|
||||
|
||||
**B) Admission deny (Teams card + Webhook)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant ZA as Zastava
|
||||
participant NO as Notify.Worker
|
||||
participant TE as Teams
|
||||
participant WH as Webhook
|
||||
|
||||
ZA->>NO: bus event zastava.admission { decision: "deny", reasons: [...] }
|
||||
NO->>TE: POST adaptive card
|
||||
TE-->>NO: 200 OK
|
||||
NO->>WH: POST JSON (signed)
|
||||
WH-->>NO: 2xx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 18) Implementation notes
|
||||
|
||||
* **Language**: .NET 10; minimal API; `System.Text.Json` with canonical writer for body hashing; Channels for pipelines.
|
||||
* **Bus**: Redis Streams (**XGROUP** consumers) or NATS JetStream for at‑least‑once with ack; per‑tenant consumer groups to localize backpressure.
|
||||
* **Templates**: compile and cache per rule+channel+locale; version with rule `updatedAt` to invalidate.
|
||||
* **Rules**: store raw YAML + parsed AST; validate with schema + static checks (e.g., nonsensical combos).
|
||||
* **Secrets**: pluggable secret resolver (Authority Secret proxy, K8s, Vault).
|
||||
* **Rate limiting**: `System.Threading.RateLimiting` + per‑connector adapters.
|
||||
|
||||
---
|
||||
|
||||
## 19) Roadmap (post‑v1)
|
||||
|
||||
* **PagerDuty/Opsgenie** connectors; **Jira** ticket creation.
|
||||
* **User inbox** (in‑app notifications) + mobile push via webhook relay.
|
||||
* **Anomaly suppression**: auto‑pause noisy rules with hints (learned thresholds).
|
||||
* **Graph rules**: “only notify if *not_affected → affected* transition at consensus layer”.
|
||||
* **Label enrichment**: pluggable taggers (business criticality, data classification) to refine matchers.
|
||||
@@ -40,6 +40,8 @@ src/
|
||||
└─ StellaOps.Scanner.Sbomer.DockerImage/ # CLI‑driven scanner container
|
||||
```
|
||||
|
||||
Analyzer assemblies and buildx generators are packaged as **restart-time plug-ins** under `plugins/scanner/**` with manifests; services must restart to activate new plug-ins.
|
||||
|
||||
**Runtime form‑factor:** two deployables
|
||||
|
||||
* **Scanner.WebService** (stateless REST)
|
||||
@@ -410,4 +412,3 @@ vector<string> purls
|
||||
map<purlIndex, roaring_bitmap> components
|
||||
optional map<purlIndex, roaring_bitmap> usedByEntrypoint
|
||||
```
|
||||
|
||||
|
||||
424
docs/ARCHITECTURE_SCHEDULER.md
Normal file
424
docs/ARCHITECTURE_SCHEDULER.md
Normal file
@@ -0,0 +1,424 @@
|
||||
# component_architecture_scheduler.md — **Stella Ops Scheduler** (2025Q4)
|
||||
|
||||
> **Scope.** Implementation‑ready architecture for **Scheduler**: a service that (1) **re‑evaluates** already‑cataloged images when intel changes (Feedser/Vexer/policy), (2) orchestrates **nightly** and **ad‑hoc** runs, (3) targets only the **impacted** images using the BOM‑Index, and (4) emits **report‑ready** events that downstream **Notify** fans out. Default mode is **analysis‑only** (no image pull); optional **content‑refresh** can be enabled per schedule.
|
||||
|
||||
---
|
||||
|
||||
## 0) Mission & boundaries
|
||||
|
||||
**Mission.** Keep scan results **current** without rescanning the world. When new advisories or VEX claims land, **pinpoint** affected images and ask the backend to recompute **verdicts** against the **existing SBOMs**. Surface only **meaningful deltas** to humans and ticket queues.
|
||||
|
||||
**Boundaries.**
|
||||
|
||||
* Scheduler **does not** compute SBOMs and **does not** sign. It calls Scanner/WebService’s **/reports (analysis‑only)** endpoint and lets the backend (Policy + Vexer + Feedser) decide PASS/FAIL.
|
||||
* Scheduler **may** ask Scanner to **content‑refresh** selected targets (e.g., mutable tags) but the default is **no** image pull.
|
||||
* Notifications are **not** sent directly; Scheduler emits events consumed by **Notify**.
|
||||
|
||||
---
|
||||
|
||||
## 1) Runtime shape & projects
|
||||
|
||||
```
|
||||
src/
|
||||
├─ StellaOps.Scheduler.WebService/ # REST (schedules CRUD, runs, admin)
|
||||
├─ StellaOps.Scheduler.Worker/ # planners + runners (N replicas)
|
||||
├─ StellaOps.Scheduler.ImpactIndex/ # purl→images inverted index (roaring bitmaps)
|
||||
├─ StellaOps.Scheduler.Models/ # DTOs (Schedule, Run, ImpactSet, Deltas)
|
||||
├─ StellaOps.Scheduler.Storage.Mongo/ # schedules, runs, cursors, locks
|
||||
├─ StellaOps.Scheduler.Queue/ # Redis Streams / NATS abstraction
|
||||
├─ StellaOps.Scheduler.Tests.* # unit/integration/e2e
|
||||
```
|
||||
|
||||
**Deployables**:
|
||||
|
||||
* **Scheduler.WebService** (stateless)
|
||||
* **Scheduler.Worker** (scale‑out; planners + executors)
|
||||
|
||||
**Dependencies**: Authority (OpTok + DPoP/mTLS), Scanner.WebService, Feedser, Vexer, MongoDB, Redis/NATS, (optional) Notify.
|
||||
|
||||
---
|
||||
|
||||
## 2) Core responsibilities
|
||||
|
||||
1. **Time‑based** runs: cron windows per tenant/timezone (e.g., “02:00 Europe/Sofia”).
|
||||
2. **Event‑driven** runs: react to **Feedser export** and **Vexer export** deltas (changed product keys / advisories / claims).
|
||||
3. **Impact targeting**: map changes to **image sets** using a **global inverted index** built from Scanner’s per‑image **BOM‑Index** sidecars.
|
||||
4. **Run planning**: shard, pace, and rate‑limit jobs to avoid thundering herds.
|
||||
5. **Execution**: call Scanner **/reports (analysis‑only)** or **/scans (content‑refresh)**; aggregate **delta** results.
|
||||
6. **Events**: publish `rescan.delta` and `report.ready` summaries for **Notify** & **UI**.
|
||||
7. **Control plane**: CRUD schedules, **pause/resume**, dry‑run previews, audit.
|
||||
|
||||
---
|
||||
|
||||
## 3) Data model (Mongo)
|
||||
|
||||
**Database**: `scheduler`
|
||||
|
||||
* `schedules`
|
||||
|
||||
```
|
||||
{ _id, tenantId, name, enabled, whenCron, timezone,
|
||||
mode: "analysis-only" | "content-refresh",
|
||||
selection: { scope: "all-images" | "by-namespace" | "by-repo" | "by-digest" | "by-labels",
|
||||
includeTags?: ["prod-*"], digests?: [sha256...], resolvesTags?: bool },
|
||||
onlyIf: { lastReportOlderThanDays?: int, policyRevision?: string },
|
||||
notify: { onNewFindings: bool, minSeverity: "low|medium|high|critical", includeKEV: bool },
|
||||
limits: { maxJobs?: int, ratePerSecond?: int, parallelism?: int },
|
||||
createdAt, updatedAt, createdBy, updatedBy }
|
||||
```
|
||||
|
||||
* `runs`
|
||||
|
||||
```
|
||||
{ _id, scheduleId?, tenantId, trigger: "cron|feedser|vexer|manual",
|
||||
reason?: { feedserExportId?, vexerExportId?, cursor? },
|
||||
state: "planning|queued|running|completed|error|cancelled",
|
||||
stats: { candidates: int, deduped: int, queued: int, completed: int, deltas: int, newCriticals: int },
|
||||
startedAt, finishedAt, error? }
|
||||
```
|
||||
|
||||
* `impact_cursors`
|
||||
|
||||
```
|
||||
{ _id: tenantId, feedserLastExportId, vexerLastExportId, updatedAt }
|
||||
```
|
||||
|
||||
* `locks` (singleton schedulers, run leases)
|
||||
|
||||
* `audit` (CRUD actions, run outcomes)
|
||||
|
||||
**Indexes**:
|
||||
|
||||
* `schedules` on `{tenantId, enabled}`, `{whenCron}`.
|
||||
* `runs` on `{tenantId, startedAt desc}`, `{state}`.
|
||||
* TTL optional for completed runs (e.g., 180 days).
|
||||
|
||||
---
|
||||
|
||||
## 4) ImpactIndex (global inverted index)
|
||||
|
||||
Goal: translate **change keys** → **image sets** in **milliseconds**.
|
||||
|
||||
**Source**: Scanner produces per‑image **BOM‑Index** sidecars (purls, and `usedByEntrypoint` bitmaps). Scheduler ingests/refreshes them to build a **global** index.
|
||||
|
||||
**Representation**:
|
||||
|
||||
* Assign **image IDs** (dense ints) to catalog images.
|
||||
* Keep **Roaring Bitmaps**:
|
||||
|
||||
* `Contains[purl] → bitmap(imageIds)`
|
||||
* `UsedBy[purl] → bitmap(imageIds)` (subset of Contains)
|
||||
* Optionally keep **Owner maps**: `{imageId → {tenantId, namespaces[], repos[]}}` for selection filters.
|
||||
* Persist in RocksDB/LMDB or Redis‑modules; cache hot shards in memory; snapshot to Mongo for cold start.
|
||||
|
||||
**Update paths**:
|
||||
|
||||
* On new/updated image SBOM: **merge** per‑image set into global maps.
|
||||
* On image remove/expiry: **clear** id from bitmaps.
|
||||
|
||||
**API (internal)**:
|
||||
|
||||
```csharp
|
||||
IImpactIndex {
|
||||
ImpactSet ResolveByPurls(IEnumerable<string> purls, bool usageOnly, Selector sel);
|
||||
ImpactSet ResolveByVulns(IEnumerable<string> vulnIds, bool usageOnly, Selector sel); // optional (vuln->purl precomputed by Feedser)
|
||||
ImpactSet ResolveAll(Selector sel); // for nightly
|
||||
}
|
||||
```
|
||||
|
||||
**Selector filters**: tenant, namespaces, repos, labels, digest allowlists, `includeTags` patterns.
|
||||
|
||||
---
|
||||
|
||||
## 5) External interfaces (REST)
|
||||
|
||||
Base path: `/api/v1/scheduler` (Authority OpToks; scopes: `scheduler.read`, `scheduler.admin`).
|
||||
|
||||
### 5.1 Schedules CRUD
|
||||
|
||||
* `POST /schedules` → create
|
||||
* `GET /schedules` → list (filter by tenant)
|
||||
* `GET /schedules/{id}` → details + next run
|
||||
* `PATCH /schedules/{id}` → pause/resume/update
|
||||
* `DELETE /schedules/{id}` → delete (soft delete, optional)
|
||||
|
||||
### 5.2 Run control & introspection
|
||||
|
||||
* `POST /run` — ad‑hoc run
|
||||
|
||||
```json
|
||||
{ "mode": "analysis-only|content-refresh", "selection": {...}, "reason": "manual" }
|
||||
```
|
||||
* `GET /runs` — list with paging
|
||||
* `GET /runs/{id}` — status, stats, links to deltas
|
||||
* `POST /runs/{id}/cancel` — best‑effort cancel
|
||||
|
||||
### 5.3 Previews (dry‑run)
|
||||
|
||||
* `POST /preview/impact` — returns **candidate count** and a small sample of impacted digests for given change keys or selection.
|
||||
|
||||
### 5.4 Event webhooks (optional push from Feedser/Vexer)
|
||||
|
||||
* `POST /events/feedser-export`
|
||||
|
||||
```json
|
||||
{ "exportId":"...", "changedProductKeys":["pkg:rpm/openssl", ...], "kev": ["CVE-..."], "window": { "from":"...","to":"..." } }
|
||||
```
|
||||
* `POST /events/vexer-export`
|
||||
|
||||
```json
|
||||
{ "exportId":"...", "changedClaims":[ { "productKey":"pkg:deb/...", "vulnId":"CVE-...", "status":"not_affected→affected"} ], ... }
|
||||
```
|
||||
|
||||
**Security**: webhook requires **mTLS** or an **HMAC** `X-Scheduler-Signature` (Ed25519 / SHA‑256) plus Authority token.
|
||||
|
||||
---
|
||||
|
||||
## 6) Planner → Runner pipeline
|
||||
|
||||
### 6.1 Planning algorithm (event‑driven)
|
||||
|
||||
```
|
||||
On Export Event (Feedser/Vexer):
|
||||
keys = Normalize(change payload) # productKeys or vulnIds→productKeys
|
||||
usageOnly = schedule/policy hint? # default true
|
||||
sel = Selector for tenant/scope from schedules subscribed to events
|
||||
|
||||
impacted = ImpactIndex.ResolveByPurls(keys, usageOnly, sel)
|
||||
impacted = ApplyOwnerFilters(impacted, sel) # namespaces/repos/labels
|
||||
impacted = DeduplicateByDigest(impacted)
|
||||
impacted = EnforceLimits(impacted, limits.maxJobs)
|
||||
shards = Shard(impacted, byHashPrefix, n=limits.parallelism)
|
||||
|
||||
For each shard:
|
||||
Enqueue RunSegment (runId, shard, rate=limits.ratePerSecond)
|
||||
```
|
||||
|
||||
**Fairness & pacing**
|
||||
|
||||
* Use **leaky bucket** per tenant and per registry host.
|
||||
* Prioritize **KEV‑tagged** and **critical** first if oversubscribed.
|
||||
|
||||
### 6.2 Nightly planning
|
||||
|
||||
```
|
||||
At cron tick:
|
||||
sel = resolve selection
|
||||
candidates = ImpactIndex.ResolveAll(sel)
|
||||
if lastReportOlderThanDays present → filter by report age (via Scanner catalog)
|
||||
shard & enqueue as above
|
||||
```
|
||||
|
||||
### 6.3 Execution (Runner)
|
||||
|
||||
* Pop **RunSegment** job → for each image digest:
|
||||
|
||||
* **analysis‑only**: `POST scanner/reports { imageDigest, policyRevision? }`
|
||||
* **content‑refresh**: resolve tag→digest if needed; `POST scanner/scans { imageRef, attest? false }` then `POST /reports`
|
||||
* Collect **delta**: `newFindings`, `newCriticals`/`highs`, `links` (UI deep link, Rekor if present).
|
||||
* Persist per‑image outcome in `runs.{id}.stats` (incremental counters).
|
||||
* Emit `scheduler.rescan.delta` events to **Notify** only when **delta > 0** and matches severity rule.
|
||||
|
||||
---
|
||||
|
||||
## 7) Event model (outbound)
|
||||
|
||||
**Topic**: `rescan.delta` (internal bus → Notify; UI subscribes via backend).
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant": "tenant-01",
|
||||
"runId": "324af…",
|
||||
"imageDigest": "sha256:…",
|
||||
"newCriticals": 1,
|
||||
"newHigh": 2,
|
||||
"kevHits": ["CVE-2025-..."],
|
||||
"topFindings": [
|
||||
{ "purl":"pkg:rpm/openssl@3.0.12-...","vulnId":"CVE-2025-...","severity":"critical","link":"https://ui/scans/..." }
|
||||
],
|
||||
"reportUrl": "https://ui/.../scans/sha256:.../report",
|
||||
"attestation": { "uuid":"rekor-uuid", "verified": true },
|
||||
"ts": "2025-10-18T03:12:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Also**: `report.ready` for “no‑change” summaries (digest + zero delta), which Notify can ignore by rule.
|
||||
|
||||
---
|
||||
|
||||
## 8) Security posture
|
||||
|
||||
* **AuthN/Z**: Authority OpToks with `aud=scheduler`; DPoP (preferred) or mTLS.
|
||||
* **Multi‑tenant**: every schedule, run, and event carries `tenantId`; ImpactIndex filters by tenant‑visible images.
|
||||
* **Webhook** callers (Feedser/Vexer) present **mTLS** or **HMAC** and Authority token.
|
||||
* **Input hardening**: size caps on changed key lists; reject >100k keys per event; compress (zstd/gzip) allowed with limits.
|
||||
* **No secrets** in logs; redact tokens and signatures.
|
||||
|
||||
---
|
||||
|
||||
## 9) Observability & SLOs
|
||||
|
||||
**Metrics (Prometheus)**
|
||||
|
||||
* `scheduler.events_total{source, result}`
|
||||
* `scheduler.impact_resolve_seconds{quantile}`
|
||||
* `scheduler.images_selected_total{mode}`
|
||||
* `scheduler.jobs_enqueued_total{mode}`
|
||||
* `scheduler.run_latency_seconds{quantile}` // event → first verdict
|
||||
* `scheduler.delta_images_total{severity}`
|
||||
* `scheduler.rate_limited_total{reason}`
|
||||
|
||||
**Targets**
|
||||
|
||||
* Resolve 10k changed keys → impacted set in **<300 ms** (hot cache).
|
||||
* Event → first rescan verdict in **≤60 s** (p95).
|
||||
* Nightly coverage 50k images in **≤10 min** with 10 workers (analysis‑only).
|
||||
|
||||
**Tracing** (OTEL): spans `plan`, `resolve`, `enqueue`, `report_call`, `persist`, `emit`.
|
||||
|
||||
---
|
||||
|
||||
## 10) Configuration (YAML)
|
||||
|
||||
```yaml
|
||||
scheduler:
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
require: "dpop" # or "mtls"
|
||||
queue:
|
||||
kind: "redis" # or "nats"
|
||||
url: "redis://redis:6379/4"
|
||||
mongo:
|
||||
uri: "mongodb://mongo/scheduler"
|
||||
impactIndex:
|
||||
storage: "rocksdb" # "rocksdb" | "redis" | "memory"
|
||||
warmOnStart: true
|
||||
usageOnlyDefault: true
|
||||
limits:
|
||||
defaultRatePerSecond: 50
|
||||
defaultParallelism: 8
|
||||
maxJobsPerRun: 50000
|
||||
integrates:
|
||||
scannerUrl: "https://scanner-web.internal"
|
||||
feedserWebhook: true
|
||||
vexerWebhook: true
|
||||
notifications:
|
||||
emitBus: "internal" # deliver to Notify via internal bus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11) UI touch‑points
|
||||
|
||||
* **Schedules** page: CRUD, enable/pause, next run, last run stats, mode (analysis/content), selector preview.
|
||||
* **Runs** page: timeline; heat‑map of deltas; drill‑down to affected images.
|
||||
* **Dry‑run preview** modal: “This Feedser export touches ~3,214 images; projected deltas: ~420 (34 KEV).”
|
||||
|
||||
---
|
||||
|
||||
## 12) Failure modes & degradations
|
||||
|
||||
| Condition | Behavior |
|
||||
| ------------------------------------ | ---------------------------------------------------------------------------------------- |
|
||||
| ImpactIndex cold / incomplete | Fall back to **All** selection for nightly; for events, cap to KEV+critical until warmed |
|
||||
| Feedser/Vexer webhook storm | Coalesce by exportId; debounce 30–60 s; keep last |
|
||||
| Scanner under load (429) | Backoff with jitter; respect per‑tenant/leaky bucket |
|
||||
| Oversubscription (too many impacted) | Prioritize KEV/critical first; spillover to next window; UI banner shows backlog |
|
||||
| Notify down | Buffer outbound events in queue (TTL 24h) |
|
||||
| Mongo slow | Cut batch sizes; sample‑log; alert ops; don’t drop runs unless critical |
|
||||
|
||||
---
|
||||
|
||||
## 13) Testing matrix
|
||||
|
||||
* **ImpactIndex**: correctness (purl→image sets), performance, persistence after restart, memory pressure with 1M purls.
|
||||
* **Planner**: dedupe, shard, fairness, limit enforcement, KEV prioritization.
|
||||
* **Runner**: parallel report calls, error backoff, partial failures, idempotency.
|
||||
* **End‑to‑end**: Feedser export → deltas visible in UI in ≤60 s.
|
||||
* **Security**: webhook auth (mTLS/HMAC), DPoP nonce dance, tenant isolation.
|
||||
* **Chaos**: drop scanner availability; simulate registry throttles (content‑refresh mode).
|
||||
* **Nightly**: cron tick correctness across timezones and DST.
|
||||
|
||||
---
|
||||
|
||||
## 14) Implementation notes
|
||||
|
||||
* **Language**: .NET 10 minimal API; Channels‑based pipeline; `System.Threading.RateLimiting`.
|
||||
* **Bitmaps**: Roaring via `RoaringBitmap` bindings; memory‑map large shards if RocksDB used.
|
||||
* **Cron**: Quartz‑style parser with timezone support; clock skew tolerated ±60 s.
|
||||
* **Dry‑run**: use ImpactIndex only; never call scanner.
|
||||
* **Idempotency**: run segments carry deterministic keys; retries safe.
|
||||
* **Backpressure**: per‑tenant buckets; per‑host registry budgets respected when content‑refresh enabled.
|
||||
|
||||
---
|
||||
|
||||
## 15) Sequences (representative)
|
||||
|
||||
**A) Event‑driven rescan (Feedser delta)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant FE as Feedser
|
||||
participant SCH as Scheduler.Worker
|
||||
participant IDX as ImpactIndex
|
||||
participant SC as Scanner.WebService
|
||||
participant NO as Notify
|
||||
|
||||
FE->>SCH: POST /events/feedser-export {exportId, changedProductKeys}
|
||||
SCH->>IDX: ResolveByPurls(keys, usageOnly=true, sel)
|
||||
IDX-->>SCH: bitmap(imageIds) → digests list
|
||||
SCH->>SC: POST /reports {imageDigest} (batch/sequenced)
|
||||
SC-->>SCH: report deltas (new criticals/highs)
|
||||
alt delta>0
|
||||
SCH->>NO: rescan.delta {digest, newCriticals, links}
|
||||
end
|
||||
```
|
||||
|
||||
**B) Nightly rescan**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant CRON as Cron
|
||||
participant SCH as Scheduler.Worker
|
||||
participant IDX as ImpactIndex
|
||||
participant SC as Scanner.WebService
|
||||
|
||||
CRON->>SCH: tick (02:00 Europe/Sofia)
|
||||
SCH->>IDX: ResolveAll(selector)
|
||||
IDX-->>SCH: candidates
|
||||
SCH->>SC: POST /reports {digest} (paced)
|
||||
SC-->>SCH: results
|
||||
SCH-->>SCH: aggregate, store run stats
|
||||
```
|
||||
|
||||
**C) Content‑refresh (tag followers)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant SCH as Scheduler
|
||||
participant SC as Scanner
|
||||
SCH->>SC: resolve tag→digest (if changed)
|
||||
alt digest changed
|
||||
SCH->>SC: POST /scans {imageRef} # new SBOM
|
||||
SC-->>SCH: scan complete (artifacts)
|
||||
SCH->>SC: POST /reports {imageDigest}
|
||||
else unchanged
|
||||
SCH->>SC: POST /reports {imageDigest} # analysis-only
|
||||
end
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 16) Roadmap
|
||||
|
||||
* **Vuln‑centric impact**: pre‑join vuln→purl→images to rank by **KEV** and **exploited‑in‑the‑wild** signals.
|
||||
* **Policy diff preview**: when a staged policy changes, show projected breakage set before promotion.
|
||||
* **Cross‑cluster federation**: one Scheduler instance driving many Scanner clusters (tenant isolation).
|
||||
* **Windows containers**: integrate Zastava runtime hints for Usage view tightening.
|
||||
|
||||
---
|
||||
|
||||
**End — component_architecture_scheduler.md**
|
||||
@@ -31,31 +31,33 @@ Everything here is open‑source and versioned — when you check out a git ta
|
||||
- **03 – [Vision & Road‑map](03_VISION.md)**
|
||||
- **04 – [Feature Matrix](04_FEATURE_MATRIX.md)**
|
||||
|
||||
### Reference & concepts
|
||||
- **05 – [System Requirements Specification](05_SYSTEM_REQUIREMENTS_SPEC.md)**
|
||||
- **07 – [High‑Level Architecture](07_HIGH_LEVEL_ARCHITECTURE.md)**
|
||||
- **08 – Module Architecture Dossiers**
|
||||
- [Scanner](ARCHITECTURE_SCANNER.md)
|
||||
- [Concelier](ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor](ARCHITECTURE_EXCITITOR.md)
|
||||
- [Signer](ARCHITECTURE_SIGNER.md)
|
||||
- [Attestor](ARCHITECTURE_ATTESTOR.md)
|
||||
- [Authority](ARCHITECTURE_AUTHORITY.md)
|
||||
- [CLI](ARCHITECTURE_CLI.md)
|
||||
- [Web UI](ARCHITECTURE_UI.md)
|
||||
- [Zastava Runtime](ARCHITECTURE_ZASTAVA.md)
|
||||
- [Release & Operations](ARCHITECTURE_DEVOPS.md)
|
||||
- **09 – [API & CLI Reference](09_API_CLI_REFERENCE.md)**
|
||||
- **10 – [Plug‑in SDK Guide](10_PLUGIN_SDK_GUIDE.md)**
|
||||
- **10 – [Concelier CLI Quickstart](10_CONCELIER_CLI_QUICKSTART.md)**
|
||||
- **30 – [Excititor Connector Packaging Guide](dev/30_EXCITITOR_CONNECTOR_GUIDE.md)**
|
||||
- **30 – Developer Templates**
|
||||
- [Excititor Connector Skeleton](dev/templates/excititor-connector/)
|
||||
- **11 – [Authority Service](11_AUTHORITY.md)**
|
||||
- **11 – [Data Schemas](11_DATA_SCHEMAS.md)**
|
||||
- **12 – [Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
|
||||
- **13 – [Release‑Engineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
|
||||
- **30 – [Fixture Maintenance](dev/fixtures.md)**
|
||||
### Reference & concepts
|
||||
- **05 – [System Requirements Specification](05_SYSTEM_REQUIREMENTS_SPEC.md)**
|
||||
- **07 – [High‑Level Architecture](07_HIGH_LEVEL_ARCHITECTURE.md)**
|
||||
- **08 – Module Architecture Dossiers**
|
||||
- [Scanner](ARCHITECTURE_SCANNER.md)
|
||||
- [Concelier](ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor](ARCHITECTURE_EXCITITOR.md)
|
||||
- [Signer](ARCHITECTURE_SIGNER.md)
|
||||
- [Attestor](ARCHITECTURE_ATTESTOR.md)
|
||||
- [Authority](ARCHITECTURE_AUTHORITY.md)
|
||||
- [Notify](ARCHITECTURE_NOTIFY.md)
|
||||
- [Scheduler](ARCHITECTURE_SCHEDULER.md)
|
||||
- [CLI](ARCHITECTURE_CLI.md)
|
||||
- [Web UI](ARCHITECTURE_UI.md)
|
||||
- [Zastava Runtime](ARCHITECTURE_ZASTAVA.md)
|
||||
- [Release & Operations](ARCHITECTURE_DEVOPS.md)
|
||||
- **09 – [API & CLI Reference](09_API_CLI_REFERENCE.md)**
|
||||
- **10 – [Plug‑in SDK Guide](10_PLUGIN_SDK_GUIDE.md)**
|
||||
- **10 – [Concelier CLI Quickstart](10_CONCELIER_CLI_QUICKSTART.md)**
|
||||
- **30 – [Excititor Connector Packaging Guide](dev/30_EXCITITOR_CONNECTOR_GUIDE.md)**
|
||||
- **30 – Developer Templates**
|
||||
- [Excititor Connector Skeleton](dev/templates/excititor-connector/)
|
||||
- **11 – [Authority Service](11_AUTHORITY.md)**
|
||||
- **11 – [Data Schemas](11_DATA_SCHEMAS.md)**
|
||||
- **12 – [Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
|
||||
- **13 – [Release‑Engineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
|
||||
- **30 – [Fixture Maintenance](dev/fixtures.md)**
|
||||
|
||||
### User & operator guides
|
||||
- **14 – [Glossary](14_GLOSSARY_OF_TERMS.md)**
|
||||
@@ -64,18 +66,18 @@ Everything here is open‑source and versioned — when you check out a git ta
|
||||
- **18 – [Coding Standards](18_CODING_STANDARDS.md)**
|
||||
- **19 – [Test‑Suite Overview](19_TEST_SUITE_OVERVIEW.md)**
|
||||
- **21 – [Install Guide](21_INSTALL_GUIDE.md)**
|
||||
- **22 – [CI/CD Recipes Library](ci/20_CI_RECIPES.md)**
|
||||
- **23 – [FAQ](23_FAQ_MATRIX.md)**
|
||||
- **24 – [Offline Update Kit Admin Guide](24_OFFLINE_KIT.md)**
|
||||
- **25 – [Concelier Apple Connector Operations](ops/concelier-apple-operations.md)**
|
||||
- **26 – [Authority Key Rotation Playbook](ops/authority-key-rotation.md)**
|
||||
- **27 – [Concelier CCCS Connector Operations](ops/concelier-cccs-operations.md)**
|
||||
- **28 – [Concelier CISA ICS Connector Operations](ops/concelier-icscisa-operations.md)**
|
||||
- **29 – [Concelier CERT-Bund Connector Operations](ops/concelier-certbund-operations.md)**
|
||||
- **30 – [Concelier MSRC Connector – AAD Onboarding](ops/concelier-msrc-operations.md)**
|
||||
|
||||
### Legal & licence
|
||||
- **31 – [Legal & Quota FAQ](29_LEGAL_FAQ_QUOTA.md)**
|
||||
- **22 – [CI/CD Recipes Library](ci/20_CI_RECIPES.md)**
|
||||
- **23 – [FAQ](23_FAQ_MATRIX.md)**
|
||||
- **24 – [Offline Update Kit Admin Guide](24_OFFLINE_KIT.md)**
|
||||
- **25 – [Concelier Apple Connector Operations](ops/concelier-apple-operations.md)**
|
||||
- **26 – [Authority Key Rotation Playbook](ops/authority-key-rotation.md)**
|
||||
- **27 – [Concelier CCCS Connector Operations](ops/concelier-cccs-operations.md)**
|
||||
- **28 – [Concelier CISA ICS Connector Operations](ops/concelier-icscisa-operations.md)**
|
||||
- **29 – [Concelier CERT-Bund Connector Operations](ops/concelier-certbund-operations.md)**
|
||||
- **30 – [Concelier MSRC Connector – AAD Onboarding](ops/concelier-msrc-operations.md)**
|
||||
|
||||
### Legal & licence
|
||||
- **31 – [Legal & Quota FAQ](29_LEGAL_FAQ_QUOTA.md)**
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
@@ -9,6 +9,9 @@
|
||||
| DOC5.Concelier-Runbook | DONE (2025-10-12) | Docs Guild | DOC3.Concelier-Authority | Produce dedicated Concelier authority audit runbook covering log fields, monitoring recommendations, and troubleshooting steps. | ✅ Runbook published; ✅ linked from DOC3/DOC5; ✅ alerting guidance included. |
|
||||
| FEEDDOCS-DOCS-05-001 | DONE (2025-10-11) | Docs Guild | FEEDMERGE-ENGINE-04-001, FEEDMERGE-ENGINE-04-002 | Publish Concelier conflict resolution runbook covering precedence workflow, merge-event auditing, and Sprint 3 metrics. | ✅ `docs/ops/concelier-conflict-resolution.md` committed; ✅ metrics/log tables align with latest merge code; ✅ Ops alert guidance handed to Concelier team. |
|
||||
| FEEDDOCS-DOCS-05-002 | DONE (2025-10-16) | Docs Guild, Concelier Ops | FEEDDOCS-DOCS-05-001 | Ops sign-off captured: conflict runbook circulated, alert thresholds tuned, and rollout decisions documented in change log. | ✅ Ops review recorded; ✅ alert thresholds finalised using `docs/ops/concelier-authority-audit-runbook.md`; ✅ change-log entry linked from runbook once GHSA/NVD/OSV regression fixtures land. |
|
||||
| DOCS-ADR-09-001 | TODO | Docs Guild, DevEx | — | Establish ADR process (`docs/adr/0000-template.md`) and document usage guidelines. | Template published; README snippet linking ADR process; announcement posted. |
|
||||
| DOCS-EVENTS-09-002 | TODO | Docs Guild, Platform Events | SCANNER-EVENTS-15-201 | Publish event schema catalog (`docs/events/`) for `scanner.report.ready@1`, `scheduler.rescan.delta@1`, `attestor.logged@1`. | Schemas validated; docs/events/README summarises usage; Notify/Scheduler teams acknowledge. |
|
||||
| DOCS-RUNTIME-17-004 | TODO | Docs Guild, Runtime Guild | SCANNER-EMIT-17-701, ZASTAVA-OBS-17-005, DEVOPS-REL-17-002 | Document build-id workflows: SBOM exposure, runtime event payloads, debug-store layout, and operator guidance for symbol retrieval. | Architecture + operator docs updated with build-id sections, examples show `readelf` output + debuginfod usage, references linked from Offline Kit/Release guides. |
|
||||
|
||||
> Update statuses (TODO/DOING/REVIEW/DONE/BLOCKED) as progress changes. Keep guides in sync with configuration samples under `etc/`.
|
||||
|
||||
|
||||
18
docs/adr/0000-template.md
Normal file
18
docs/adr/0000-template.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# ADR-0000: Title
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
- What decision needs to be made?
|
||||
- What are the forces (requirements, constraints, stakeholders)?
|
||||
|
||||
## Decision
|
||||
- Summary of the chosen option.
|
||||
|
||||
## Consequences
|
||||
- Positive/negative consequences.
|
||||
- Follow-up actions or tasks.
|
||||
|
||||
## References
|
||||
- Links to related ADRs, issues, documents.
|
||||
9
docs/events/README.md
Normal file
9
docs/events/README.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Event Envelope Schemas
|
||||
|
||||
Versioned JSON Schemas for platform events consumed by Scheduler, Notify, and UI.
|
||||
|
||||
- `scanner.report.ready@1.json`
|
||||
- `scheduler.rescan.delta@1.json`
|
||||
- `attestor.logged@1.json`
|
||||
|
||||
Producers must bump the version suffix when introducing breaking changes; consumers validate incoming payloads against these schemas.
|
||||
38
docs/events/attestor.logged@1.json
Normal file
38
docs/events/attestor.logged@1.json
Normal file
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/events/attestor.logged@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"type": "object",
|
||||
"required": ["eventId", "kind", "tenant", "ts", "payload"],
|
||||
"properties": {
|
||||
"eventId": {"type": "string", "format": "uuid"},
|
||||
"kind": {"const": "attestor.logged"},
|
||||
"tenant": {"type": "string"},
|
||||
"ts": {"type": "string", "format": "date-time"},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"required": ["artifactSha256", "rekor", "subject"],
|
||||
"properties": {
|
||||
"artifactSha256": {"type": "string"},
|
||||
"rekor": {
|
||||
"type": "object",
|
||||
"required": ["uuid", "url"],
|
||||
"properties": {
|
||||
"uuid": {"type": "string"},
|
||||
"url": {"type": "string", "format": "uri"},
|
||||
"index": {"type": "integer", "minimum": 0}
|
||||
}
|
||||
},
|
||||
"subject": {
|
||||
"type": "object",
|
||||
"required": ["type", "name"],
|
||||
"properties": {
|
||||
"type": {"enum": ["sbom", "report", "vex-export"]},
|
||||
"name": {"type": "string"}
|
||||
}
|
||||
}
|
||||
},
|
||||
"additionalProperties": true
|
||||
}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
46
docs/events/scanner.report.ready@1.json
Normal file
46
docs/events/scanner.report.ready@1.json
Normal file
@@ -0,0 +1,46 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/events/scanner.report.ready@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"type": "object",
|
||||
"required": ["eventId", "kind", "tenant", "ts", "scope", "payload"],
|
||||
"properties": {
|
||||
"eventId": {"type": "string", "format": "uuid"},
|
||||
"kind": {"const": "scanner.report.ready"},
|
||||
"tenant": {"type": "string"},
|
||||
"ts": {"type": "string", "format": "date-time"},
|
||||
"scope": {
|
||||
"type": "object",
|
||||
"required": ["repo", "digest"],
|
||||
"properties": {
|
||||
"namespace": {"type": "string"},
|
||||
"repo": {"type": "string"},
|
||||
"digest": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"required": ["verdict", "delta", "links"],
|
||||
"properties": {
|
||||
"verdict": {"enum": ["pass", "warn", "fail"]},
|
||||
"delta": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"newCritical": {"type": "integer", "minimum": 0},
|
||||
"newHigh": {"type": "integer", "minimum": 0},
|
||||
"kev": {"type": "array", "items": {"type": "string"}}
|
||||
}
|
||||
},
|
||||
"links": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"ui": {"type": "string", "format": "uri"},
|
||||
"rekor": {"type": "string", "format": "uri"}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
},
|
||||
"additionalProperties": true
|
||||
}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
33
docs/events/scheduler.rescan.delta@1.json
Normal file
33
docs/events/scheduler.rescan.delta@1.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/events/scheduler.rescan.delta@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"type": "object",
|
||||
"required": ["eventId", "kind", "tenant", "ts", "payload"],
|
||||
"properties": {
|
||||
"eventId": {"type": "string", "format": "uuid"},
|
||||
"kind": {"const": "scheduler.rescan.delta"},
|
||||
"tenant": {"type": "string"},
|
||||
"ts": {"type": "string", "format": "date-time"},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"required": ["scheduleId", "impactedDigests", "summary"],
|
||||
"properties": {
|
||||
"scheduleId": {"type": "string"},
|
||||
"impactedDigests": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"}
|
||||
},
|
||||
"summary": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"newCritical": {"type": "integer", "minimum": 0},
|
||||
"newHigh": {"type": "integer", "minimum": 0},
|
||||
"total": {"type": "integer", "minimum": 0}
|
||||
}
|
||||
}
|
||||
},
|
||||
"additionalProperties": true
|
||||
}
|
||||
},
|
||||
"additionalProperties": false
|
||||
}
|
||||
Reference in New Issue
Block a user