feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes. - Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes. - Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables. - Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
This commit is contained in:
@@ -1,210 +0,0 @@
|
||||
# StellaOps Console Architecture (Sprint 23)
|
||||
|
||||
> **Ownership:** Console Guild • Docs Guild
|
||||
> **Delivery scope:** `StellaOps.Web` Angular workspace, Console Web Gateway routes (`/console/*`), Downloads manifest surfacing, SSE fan-out for Scheduler & telemetry.
|
||||
> **Related docs:** [Console overview](../ui/console-overview.md), [Navigation](../ui/navigation.md), [Runs workspace](../ui/runs.md), [Downloads](../ui/downloads.md), [Console security posture](../security/console-security.md), [Console observability](../observability/ui-telemetry.md), [Deployment guide](../deploy/console.md)
|
||||
|
||||
This dossier describes the end-to-end architecture of the StellaOps Console as delivered in Sprint 23. It covers the Angular workspace layout, API/gateway integration points, live-update channels, performance budgets, offline workflows, and observability hooks needed to keep the console deterministic and air-gap friendly.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Mission & Boundaries
|
||||
|
||||
- Present an operator-grade UI that surfaces Concelier, Excititor, Policy Engine, Scheduler, Attestor, and SBOM Service data **without** mutating aggregation or policy state.
|
||||
- Enforce Authority-issued scopes and tenant claims on every call through the Console Web Gateway.
|
||||
- Deliver deterministic builds (< 1 MB initial bundle) that can be mirrored in Offline Kits, with runtime configuration loaded from `/config.json`.
|
||||
- Stream live status (ingestion deltas, scheduler progress, telemetry) via SSE with graceful degradation to polling when offline or throttled.
|
||||
- Maintain CLI parity by embedding `stella` commands alongside interactive actions.
|
||||
|
||||
Non-goals: authoring ingestion logic, mutating Policy overlays, exposing internal Mongo collections, or performing cryptographic signing in-browser.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Workspace & Packages
|
||||
|
||||
The console is implemented in `src/Web/StellaOps.Web`, an Angular 17 workspace built on standalone components and Signals.
|
||||
|
||||
| Path | Purpose | Highlights |
|
||||
|------|---------|------------|
|
||||
| `src/app/core/auth` | DPoP + PKCE authentication, Authority session store, HTTP interceptors. | WebCrypto keygen (`crypto.subtle`), session metadata persisted in `sessionStorage`, DPoP nonce replay guard. |
|
||||
| `src/app/core/api` | Typed API clients for Console gateway (`/console/*`) and downstream services. | DTOs for Scanner, Notify, Concelier exporters; fetch-based clients with abort signals. |
|
||||
| `src/app/core/config` | Runtime configuration loader (`/config.json`), feature flag gating. | Supports air-gap overrides and injects API base URLs, Authority issuer/client. |
|
||||
| `src/app/features/*` | Route-level shells (auth bootstrap, scans detail, notifications inbox, Trivy DB settings). | Each feature is a standalone module with lazy loading and Angular Signals state. |
|
||||
| `src/app/testing` | Fixtures and harnesses used in unit tests and storybook-like previews. | Deterministic data used for Playwright and Jest scenarios. |
|
||||
|
||||
Workspace characteristics:
|
||||
|
||||
- **Toolchain:** Node 20.11+, npm 10.2+, Angular CLI 17.3. `npm run ci:install` primes dependencies without network audits; `scripts/verify-chromium.js` ensures headless Chromium availability for Karma.
|
||||
- **Build budgets:** `angular.json` enforces 500 KB warning / 1 MB error for initial bundle and 2 KB warning / 4 KB error per component stylesheet. Output hashing (`outputHashing: all`) keeps assets cache-safe.
|
||||
- **Testing:** Karma + Jasmine for unit tests, Playwright for e2e with dev server autotuning. CI (`DEVOPS-CONSOLE-23-001`) runs Lighthouse against the production bundle.
|
||||
- **Runtime config:** `/config.json` merged at bootstrap; gateways can rewrite it on the fly to avoid rebuilding for environment changes.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Runtime Topology & Data Flow
|
||||
|
||||
The console SPA relies on the Console Web Gateway to proxy tenant-scoped API calls to downstream services. Tenant isolation and Aggregation-Only guardrails are enforced at every hop.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Browser["Browser (Angular SPA)"]
|
||||
UI[Console Shell<br/>(Signals, Feature Modules)]
|
||||
SSE[EventSource / SSE Clients]
|
||||
end
|
||||
subgraph Gateway["Console Web Gateway"]
|
||||
Router[Minimal API / ASP.NET Core Router]
|
||||
StatusCache[Status Cache & Manifest signer]
|
||||
end
|
||||
Authority[Authority<br/>(DPoP + PKCE)]
|
||||
Concelier[Concelier.WebService]
|
||||
Excititor[Excititor.WebService]
|
||||
Scheduler[Scheduler.WebService]
|
||||
Policy[Policy Engine API]
|
||||
SBOM[SBOM Service]
|
||||
Attestor[Attestor API]
|
||||
Downloads[Downloads Manifest Store]
|
||||
|
||||
UI -->|/config.json| Gateway
|
||||
UI -->|/console/* (Bearer+DPoP)| Router
|
||||
SSE -->|/console/status/stream| Router
|
||||
Router --> Authority
|
||||
Router --> Concelier
|
||||
Router --> Excititor
|
||||
Router --> Scheduler
|
||||
Router --> Policy
|
||||
Router --> SBOM
|
||||
Router --> Attestor
|
||||
Router --> Downloads
|
||||
StatusCache -.-> Gateway
|
||||
Gateway -.-> UI
|
||||
```
|
||||
|
||||
Key interactions:
|
||||
|
||||
- **Auth bootstrap:** UI retrieves Authority metadata and exchanges an authorization code + PKCE verifier for a DPoP-bound token (`aud=console`, `tenant=<id>`). Tokens expire in 120 s; refresh tokens rotate, triggering new DPoP proofs.
|
||||
- **Tenant switch:** Picker issues `Authority /fresh-auth` when required, then refreshes UI caches (`ui.tenant.switch` log). Gateway injects `X-Stella-Tenant` headers downstream.
|
||||
- **Aggregation-only reads:** Gateway proxies `/console/advisories`, `/console/vex`, `/console/findings`, etc., without mutating Concelier or Policy data. Provenance badges and merge hashes come directly from upstream responses.
|
||||
- **Downloads parity:** `/console/downloads` merges DevOps signed manifest and Offline Kit metadata; UI renders digest, signature, and CLI parity command.
|
||||
- **Offline resilience:** Gateway exposes `/console/status` heartbeat. If unavailable, UI enters offline mode, disables SSE, and surfaces CLI fallbacks.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Live Updates & SSE Design
|
||||
|
||||
Live surfaces use HTTP/1.1 SSE with heartbeat frames to keep operators informed without polling storms.
|
||||
|
||||
| Endpoint | Payload | Source | Behaviour |
|
||||
|----------|---------|--------|-----------|
|
||||
| `/console/status/stream` | `statusChanged`, `ingestionDelta`, `attestorQueue`, `offlineBanner` events | Concelier WebService, Excititor WebService, Attestor metrics | 5 s heartbeat; gateway disables proxy buffering (`X-Accel-Buffering: no`) and sets `Cache-Control: no-store`. |
|
||||
| `/console/runs/{id}/stream` | `stateChanged`, `segmentProgress`, `deltaSummary`, `log` | Scheduler WebService SSE fan-out | Event payloads carry `traceId`, `runId`, `tenant`; UI reconnects with exponential backoff and resumes using `Last-Event-ID`. |
|
||||
| `/console/telemetry/stream` | `metricSample`, `alert`, `collectorStatus` | Observability aggregator | Gated by `ui.telemetry` scope; disabled when `CONSOLE_TELEMETRY_SSE_ENABLED=false`. |
|
||||
|
||||
Sequence overview:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant UI as Console SPA
|
||||
participant GW as Console Gateway
|
||||
participant SCHED as Scheduler WebService
|
||||
|
||||
UI->>GW: GET /console/runs/42/stream (Authorization + DPoP)
|
||||
GW->>SCHED: GET /runs/42/stream (X-Stella-Tenant)
|
||||
SCHED-->>GW: event: stateChanged data: {...}
|
||||
GW-->>UI: event: stateChanged data: {..., traceId}
|
||||
Note over UI,GW: Gateway injects retry-after + heartbeat every 15s
|
||||
UI-->>GW: (disconnect)
|
||||
UI->>GW: GET /console/runs/42/stream (Last-Event-ID: <seq>)
|
||||
GW->>SCHED: GET /runs/42/stream?since=<seq>
|
||||
```
|
||||
|
||||
Offline behaviour:
|
||||
|
||||
- If SSE fails three times within 60 s, UI falls back to polling (`/console/status`, `/console/runs/{id}`) every 30 s and shows an amber banner.
|
||||
- When `console.offlineMode=true`, SSE endpoints return `204` immediately; UI suppresses auto-reconnect to preserve resources.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Performance & Budgets
|
||||
|
||||
| Surface | Target | Enforcement |
|
||||
|---------|--------|-------------|
|
||||
| First meaningful paint (dashboard) | ≤ 2.5 s on 4 vCPU offline runner | Lighthouse CI gate (`DEVOPS-CONSOLE-23-001`), `ui_route_render_seconds` P95 alert. |
|
||||
| Route hydration (feature shells) | ≤ 1.5 s after token acquisition | Angular Signals + lazy loading; route-level budgets tracked via custom telemetry. |
|
||||
| Initial bundle size | Warn ≥ 500 KB, fail ≥ 1 MB | `angular.json` budgets; CI fails build on overflow. |
|
||||
| Component stylesheet | Warn ≥ 2 KB, fail ≥ 4 KB | `angular.json` budgets; ensures Tailwind utilities stay tree-shaken. |
|
||||
| SSE heartbeat | Every 15 s max | Gateway emits comment heartbeats; UI resets timers on each frame. |
|
||||
|
||||
Optimisation levers:
|
||||
|
||||
- Standalone components with `ChangeDetectionStrategy.OnPush` and Angular Signals avoid zone.js churn.
|
||||
- `fetch` + AbortController guard double fetches.
|
||||
- Assets served with immutable caching (`cache-control: public, max-age=31536000, immutable`) thanks to hashed filenames.
|
||||
- Compression (gzip/brotli) enabled at gateway; offline bundles include precompressed assets.
|
||||
- Command palette, tenants, and filters rely on IndexedDB caches to avoid refetching static metadata.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Offline & Configuration Workflows
|
||||
|
||||
- **Config manifest:** `/config.json` includes Authority issuer/client ID, gateway base URL, feature flags, telemetry endpoints, and offline hints. Operators can swap config by copying `src/config/config.sample.json` and editing before build, or by rewriting the response at gateway runtime.
|
||||
- **Deterministic install:** Documented in `src/Web/StellaOps.Web/docs/DeterministicInstall.md`—`npm run ci:install` plus Chromium provisioning ensures offline runners reproduce builds.
|
||||
- **Offline Kit parity:** UI validates downloads manifest signatures (cosign) and surfaces snapshot timestamps per tenant. When offline, buttons switch to CLI snippets (`stella runs export`, `stella downloads sync`).
|
||||
- **Feature flags:** `CONSOLE_FEATURE_FLAGS` toggles modules (`runs`, `downloads`, `telemetry`); offline bundles include flag manifest so UI can render only supported panes.
|
||||
- **Snapshot awareness:** Global banner shows snapshot timestamp and disables actions needing Authority fresh-auth when running in sealed mode.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Security & Tenancy
|
||||
|
||||
- **DPoP + PKCE:** Every request carries `Authorization` + `DPoP` header and gateway enforces nonce replay protection. Private keys live in IndexedDB and never leave the browser.
|
||||
- **Scope enforcement:** Gateway checks scope claims before proxying (`ui.read`, `runs.manage`, `downloads.read`, etc.) and propagates denials as `Problem+JSON` with `ERR_*` codes.
|
||||
- **Tenant propagation:** `X-Stella-Tenant` header derived from token; downstream services reject mismatches. Tenant switches log `ui.tenant.switch` and require fresh-auth for privileged actions.
|
||||
- **CSP & headers:** Default CSP forbids third-party scripts, only allows same-origin `connect-src`. HSTS, Referrer-Policy `no-referrer`, and `Permissions-Policy` configured via gateway (`deploy/console.md`).
|
||||
- **Evidence handling:** Downloads never cache secrets; UI renders SHA-256 + signature references and steers users to CLI for sensitive exports.
|
||||
- See [Console security posture](../security/console-security.md) for full scope table and threat model alignment.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Observability & Telemetry
|
||||
|
||||
- **Metrics:** Prometheus scrape at `/metrics` (enabled when `CONSOLE_METRICS_ENABLED=true`). Key histograms/counters documented in [Console observability](../observability/ui-telemetry.md) (`ui_route_render_seconds`, `ui_tenant_switch_total`, `ui_download_manifest_refresh_seconds`).
|
||||
- **Logs:** Structured JSON with `traceId`, `tenant`, `action`. Categories include `ui.action`, `ui.tenant.switch`, `ui.security.anomaly`. Sampled per feature flag to balance volume.
|
||||
- **Traces:** Browser OTLP exporter ships spans to configured collector; gateway adds server-side spans so traces cross client/server boundary.
|
||||
- **Alerts:** Burn-rate rules for route latency, telemetry batch failures, download manifest refresh, and SSE stalls integrate with Notifier.
|
||||
- **Correlation:** SSE events carry `traceId` so operators can jump from UI to backend logs using shared correlation IDs.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Integration Points & Dependencies
|
||||
|
||||
| Service | Console dependency | Notes |
|
||||
|---------|-------------------|-------|
|
||||
| Authority | OIDC, DPoP tokens, tenant catalog, fresh-auth | Requires client `console-ui` with scopes listed in security guide. |
|
||||
| Concelier WebService | `/console/advisories`, feed health, export triggers | Gateway must enforce Aggregation-Only guardrails and surface merge hashes. |
|
||||
| Excititor WebService | `/console/vex`, consensus overlays | SSE ticker shows provider deltas. |
|
||||
| Policy Engine | Findings views, policy previews, simulation diffs | Console never writes overlays; uses `effective_finding_*` data via API. |
|
||||
| Scheduler WebService | Runs dashboard, SSE streams, queue metrics | Heartbeat drives status ticker; cancellation actions require `runs.manage`. |
|
||||
| SBOM Service | SBOM explorer tree, component lookup | Responses cached per tenant; offline bundles preload snapshots. |
|
||||
| Attestor | Attestation verification, evidence links | Console displays verification status and CLI parity commands. |
|
||||
| DevOps downloads pipeline | Signed manifest for `/console/downloads` | Manifest signatures validated with cosign key shipped in Offline Kit. |
|
||||
|
||||
---
|
||||
|
||||
## 10 · Compliance Checklist
|
||||
|
||||
- [ ] Frontend package map (core/auth/api/config + feature shells) documented with ownership and tooling details.
|
||||
- [ ] Data flow diagram captures SPA ↔ Gateway ↔ downstream services with tenant & scope enforcement notes.
|
||||
- [ ] SSE design documented (endpoints, payloads, heartbeat, retry/backoff, offline fallback).
|
||||
- [ ] Performance budgets (< 1 MB initial bundle, route hydration ≤ 1.5 s, SSE heartbeat) stated alongside enforcement mechanisms.
|
||||
- [ ] Offline workflows (`/config.json`, deterministic install, Offline Kit parity) described with operator guidance.
|
||||
- [ ] Security section references DPoP, scopes, CSP, evidence handling, and tenancy propagation.
|
||||
- [ ] Observability metrics/logs/traces coverage listed with alert hooks.
|
||||
- [ ] Integration dependencies table links Console responsibilities to upstream services.
|
||||
- [ ] Document cross-references validated (UI guides, security, observability, deployment).
|
||||
- [ ] Last updated timestamp refreshed after review.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-27 (Sprint 23).*
|
||||
|
||||
@@ -1,168 +0,0 @@
|
||||
# StellaOps Architecture Overview (Sprint 19)
|
||||
|
||||
> **Ownership:** Architecture Guild • Docs Guild
|
||||
> **Audience:** Service owners, platform engineers, solution architects
|
||||
> **Related:** [High-Level Architecture](../07_HIGH_LEVEL_ARCHITECTURE.md), [Concelier Architecture](../ARCHITECTURE_CONCELIER.md), [Policy Engine Architecture](policy-engine.md), [Aggregation-Only Contract](../ingestion/aggregation-only-contract.md)
|
||||
|
||||
This dossier summarises the end-to-end runtime topology after the Aggregation-Only Contract (AOC) rollout. It highlights where raw facts live, how ingest services enforce guardrails, and how downstream components consume those facts to derive policy decisions and user-facing experiences.
|
||||
|
||||
---
|
||||
|
||||
## 1 · System landscape
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Edge["Clients & Automation"]
|
||||
CLI[stella CLI]
|
||||
UI[Console SPA]
|
||||
APIClients[CI / API Clients]
|
||||
end
|
||||
Gateway[API Gateway<br/>(JWT + DPoP scopes)]
|
||||
subgraph Scanner["Fact Collection"]
|
||||
ScannerWeb[Scanner.WebService]
|
||||
ScannerWorkers[Scanner.Workers]
|
||||
Agent[Agent Runtime]
|
||||
end
|
||||
subgraph Ingestion["Aggregation-Only Ingestion (AOC)"]
|
||||
Concelier[Concelier.WebService]
|
||||
Excititor[Excititor.WebService]
|
||||
RawStore[(MongoDB<br/>advisory_raw / vex_raw)]
|
||||
end
|
||||
subgraph Derivation["Policy & Overlay"]
|
||||
Policy[Policy Engine]
|
||||
Scheduler[Scheduler Services]
|
||||
Notify[Notifier]
|
||||
end
|
||||
subgraph Experience["UX & Export"]
|
||||
UIService[Console Backend]
|
||||
Exporters[Export / Offline Kit]
|
||||
end
|
||||
Observability[Telemetry Stack]
|
||||
|
||||
CLI --> Gateway
|
||||
UI --> Gateway
|
||||
APIClients --> Gateway
|
||||
Gateway --> ScannerWeb
|
||||
ScannerWeb --> ScannerWorkers
|
||||
ScannerWorkers --> Concelier
|
||||
ScannerWorkers --> Excititor
|
||||
Concelier --> RawStore
|
||||
Excititor --> RawStore
|
||||
RawStore --> Policy
|
||||
Policy --> Scheduler
|
||||
Policy --> Notify
|
||||
Policy --> UIService
|
||||
Scheduler --> UIService
|
||||
UIService --> Exporters
|
||||
Exporters --> CLI
|
||||
Exporters --> Offline[Offline Kit]
|
||||
Observability -.-> ScannerWeb
|
||||
Observability -.-> Concelier
|
||||
Observability -.-> Excititor
|
||||
Observability -.-> Policy
|
||||
Observability -.-> Scheduler
|
||||
Observability -.-> Notify
|
||||
```
|
||||
|
||||
Key boundaries:
|
||||
|
||||
- **AOC border.** Everything inside the Ingestion subgraph writes only immutable raw facts plus link hints. Derived severity, consensus, and risk remain outside the border.
|
||||
- **Policy-only derivation.** Policy Engine materialises `effective_finding_*` collections and emits overlays; other services consume but never mutate them.
|
||||
- **Tenant enforcement.** Authority-issued DPoP scopes flow through Gateway to every service; raw stores and overlays include `tenant` strictly.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Aggregation-Only Contract focus
|
||||
|
||||
### 2.1 Responsibilities at the boundary
|
||||
|
||||
| Area | Services | Responsibilities under AOC | Forbidden under AOC |
|
||||
|------|----------|-----------------------------|---------------------|
|
||||
| **Ingestion (Concelier / Excititor)** | `StellaOps.Concelier.WebService`, `StellaOps.Excititor.WebService` | Fetch upstream advisories/VEX, verify signatures, compute linksets, append immutable documents to `advisory_raw` / `vex_raw`, emit observability signals, expose raw read APIs. | Computing severity, consensus, suppressions, or policy hints; merging upstream sources into a single derived record; mutating existing documents. |
|
||||
| **Policy & Overlay** | `StellaOps.Policy.Engine`, Scheduler | Join SBOM inventory with raw advisories/VEX, evaluate policies, issue `effective_finding_*` overlays, drive remediation workflows. | Writing to raw collections; bypassing guard scopes; running without recorded provenance. |
|
||||
| **Experience layers** | Console, CLI, Exporters | Surface raw facts + policy overlays; run `stella aoc verify`; render AOC dashboards and reports. | Accepting ingestion payloads that lack provenance or violate guard results. |
|
||||
|
||||
### 2.2 Raw stores
|
||||
|
||||
| Collection | Purpose | Key fields | Notes |
|
||||
|------------|---------|------------|-------|
|
||||
| `advisory_raw` | Immutable vendor/ecosystem advisory documents. | `_id`, `tenant`, `source.*`, `upstream.*`, `content.raw`, `linkset`, `supersedes`. | Idempotent by `(source.vendor, upstream.upstream_id, upstream.content_hash)`. |
|
||||
| `vex_raw` | Immutable vendor VEX statements. | Mirrors `advisory_raw`; `identifiers.statements` summarises affected components. | Maintains supersedes chain identical to advisory flow. |
|
||||
| Change streams (`advisory_raw_stream`, `vex_raw_stream`) | Feed Policy Engine and Scheduler. | `operationType`, `documentKey`, `fullDocument`, `tenant`, `traceId`. | Scope filtered per tenant before delivery. |
|
||||
|
||||
### 2.3 Guarded ingestion sequence
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Upstream as Upstream Source
|
||||
participant Connector as Concelier/Excititor Connector
|
||||
participant Guard as AOCWriteGuard
|
||||
participant Mongo as MongoDB (advisory_raw / vex_raw)
|
||||
participant Stream as Change Stream
|
||||
participant Policy as Policy Engine
|
||||
|
||||
Upstream-->>Connector: CSAF / OSV / VEX document
|
||||
Connector->>Connector: Normalize transport, compute content_hash
|
||||
Connector->>Guard: Candidate raw doc (source + upstream + content + linkset)
|
||||
Guard-->>Connector: ERR_AOC_00x on violation
|
||||
Guard->>Mongo: Append immutable document (with tenant & supersedes)
|
||||
Mongo-->>Stream: Change event (tenant scoped)
|
||||
Stream->>Policy: Raw delta payload
|
||||
Policy->>Policy: Evaluate policies, compute effective findings
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.4 Authority scopes & tenancy
|
||||
|
||||
| Scope | Holder | Purpose | Notes |
|
||||
|-------|--------|---------|-------|
|
||||
| `advisory:ingest` / `vex:ingest` | Concelier / Excititor collectors | Append raw documents through ingestion endpoints. | Paired with tenant claims; requests without tenant are rejected. |
|
||||
| `advisory:read` / `vex:read` | DevOps verify identity, CLI | Run `stella aoc verify` or call `/aoc/verify`. | Read-only; cannot mutate raw docs. |
|
||||
| `effective:write` | Policy Engine | Materialise `effective_finding_*` overlays. | Only Policy Engine identity may hold; ingestion contexts receive `ERR_AOC_006` if they attempt. |
|
||||
| `findings:read` | Console, CLI, exports | Consume derived findings. | Enforced by Gateway and downstream services. |
|
||||
|
||||
---
|
||||
|
||||
## 3 · Data & control flow highlights
|
||||
|
||||
1. **Ingestion:** Concelier / Excititor connectors fetch upstream documents, compute linksets, and hand payloads to `AOCWriteGuard`. Guards validate schema, provenance, forbidden fields, supersedes pointers, and append-only rules before writing to Mongo.
|
||||
2. **Verification:** `stella aoc verify` (CLI/CI) and `/aoc/verify` endpoints replay guard checks against stored documents, mapping `ERR_AOC_00x` codes to exit codes for automation.
|
||||
3. **Policy evaluation:** Mongo change streams deliver tenant-scoped raw deltas. Policy Engine joins SBOM inventory (via BOM Index), executes deterministic policies, writes overlays, and emits events to Scheduler/Notify.
|
||||
4. **Experience surfaces:** Console renders an AOC dashboard showing ingestion latency, guard violations, and supersedes depth. CLI exposes raw-document fetch helpers for auditing. Offline Kit bundles raw collections alongside guard configs to keep air-gapped installs verifiable.
|
||||
5. **Observability:** All services emit `ingestion_write_total`, `aoc_violation_total{code}`, `ingestion_latency_seconds`, and trace spans `ingest.fetch`, `ingest.transform`, `ingest.write`, `aoc.guard`. Logs correlate via `traceId`, `tenant`, `source.vendor`, and `content_hash`.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Offline & disaster readiness
|
||||
|
||||
- **Offline Kit:** Packages raw Mongo snapshots (`advisory_raw`, `vex_raw`) plus guard configuration and CLI verifier binaries so air-gapped sites can re-run AOC checks before promotion.
|
||||
- **Recovery:** Supersedes chains allow rollback to prior revisions without mutating documents. Disaster exercises must rehearse restoring from snapshot, replaying change streams into Policy Engine, and re-validating guard compliance.
|
||||
- **Migration:** Legacy normalised fields are moved to temporary views during cutover; ingestion runtime removes writes once guard-enforced path is live (see [Migration playbook](../ingestion/aggregation-only-contract.md#8-migration-playbook)).
|
||||
|
||||
---
|
||||
|
||||
## 5 · References
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Concelier architecture](../ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor architecture](../ARCHITECTURE_EXCITITOR.md)
|
||||
- [Policy Engine architecture](policy-engine.md)
|
||||
- [Authority service](../ARCHITECTURE_AUTHORITY.md)
|
||||
- [Observability standards (upcoming)](../observability/policy.md) – interim reference for telemetry naming.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Compliance checklist
|
||||
|
||||
- [ ] AOC guard enabled for all Concelier and Excititor write paths in production.
|
||||
- [ ] Mongo schema validators deployed for `advisory_raw` and `vex_raw`; change streams scoped per tenant.
|
||||
- [ ] Authority scopes (`advisory:*`, `vex:*`, `effective:*`) configured in Gateway and validated via integration tests.
|
||||
- [ ] `stella aoc verify` wired into CI/CD pipelines with seeded violation fixtures.
|
||||
- [ ] Console AOC dashboard and CLI documentation reference the new ingestion contract.
|
||||
- [ ] Offline Kit bundles include guard configs, verifier tooling, and documentation updates.
|
||||
- [ ] Observability dashboards include violation, latency, and supersedes depth metrics with alert thresholds.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
@@ -1,243 +0,0 @@
|
||||
# Policy Engine Architecture (v2)
|
||||
|
||||
> **Ownership:** Policy Guild • Platform Guild
|
||||
> **Services:** `StellaOps.Policy.Engine` (Minimal API + worker host)
|
||||
> **Data Stores:** MongoDB (`policies`, `policy_runs`, `effective_finding_*`), Object storage (explain bundles), optional NATS/Mongo queue
|
||||
> **Related docs:** [Policy overview](../policy/overview.md), [DSL](../policy/dsl.md), [Lifecycle](../policy/lifecycle.md), [Runs](../policy/runs.md), [REST API](../api/policy.md), [Policy CLI](../cli/policy.md), [Architecture overview](../architecture/overview.md), [AOC reference](../ingestion/aggregation-only-contract.md)
|
||||
|
||||
This dossier describes the internal structure of the Policy Engine service delivered in Epic 2. It focuses on module boundaries, deterministic evaluation, orchestration, and integration contracts with Concelier, Excititor, SBOM Service, Authority, Scheduler, and Observability stacks.
|
||||
|
||||
The service operates strictly downstream of the **Aggregation-Only Contract (AOC)**. It consumes immutable `advisory_raw` and `vex_raw` documents emitted by Concelier and Excititor, derives findings inside Policy-owned collections, and never mutates ingestion stores. Refer to the architecture overview and AOC reference for system-wide guardrails and provenance obligations.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Responsibilities & Constraints
|
||||
|
||||
- Compile and evaluate `stella-dsl@1` policy packs into deterministic verdicts.
|
||||
- Join SBOM inventory, Concelier advisories, and Excititor VEX evidence via canonical linksets and equivalence tables.
|
||||
- Materialise effective findings (`effective_finding_{policyId}`) with append-only history and produce explain traces.
|
||||
- Operate incrementally: react to change streams (advisory/vex/SBOM deltas) with ≤ 5 min SLA.
|
||||
- Provide simulations with diff summaries for UI/CLI workflows without modifying state.
|
||||
- Enforce strict determinism guard (no wall-clock, RNG, network beyond allow-listed services) and RBAC + tenancy via Authority scopes.
|
||||
- Support sealed/air-gapped deployments with offline bundles and sealed-mode hints.
|
||||
|
||||
Non-goals: policy authoring UI (handled by Console), ingestion or advisory normalisation (Concelier), VEX consensus (Excititor), runtime enforcement (Zastava).
|
||||
|
||||
---
|
||||
|
||||
## 2 · High-Level Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Clients
|
||||
CLI[stella CLI]
|
||||
UI[Console Policy Editor]
|
||||
CI[CI Pipelines]
|
||||
end
|
||||
subgraph PolicyEngine["StellaOps.Policy.Engine"]
|
||||
API[Minimal API Host]
|
||||
Orchestrator[Run Orchestrator]
|
||||
WorkerPool[Evaluation Workers]
|
||||
Compiler[DSL Compiler Cache]
|
||||
Materializer[Effective Findings Writer]
|
||||
end
|
||||
subgraph RawStores["Raw Stores (AOC)"]
|
||||
AdvisoryRaw[(MongoDB<br/>advisory_raw)]
|
||||
VexRaw[(MongoDB<br/>vex_raw)]
|
||||
end
|
||||
subgraph Derived["Derived Stores"]
|
||||
Mongo[(MongoDB<br/>policies / policy_runs / effective_finding_*)]
|
||||
Blob[(Object Store / Evidence Locker)]
|
||||
Queue[(Mongo Queue / NATS)]
|
||||
end
|
||||
Concelier[(Concelier APIs)]
|
||||
Excititor[(Excititor APIs)]
|
||||
SBOM[(SBOM Service)]
|
||||
Authority[(Authority / DPoP Gateway)]
|
||||
|
||||
CLI --> API
|
||||
UI --> API
|
||||
CI --> API
|
||||
API --> Compiler
|
||||
API --> Orchestrator
|
||||
Orchestrator --> Queue
|
||||
Queue --> WorkerPool
|
||||
Concelier --> AdvisoryRaw
|
||||
Excititor --> VexRaw
|
||||
WorkerPool --> AdvisoryRaw
|
||||
WorkerPool --> VexRaw
|
||||
WorkerPool --> SBOM
|
||||
WorkerPool --> Materializer
|
||||
Materializer --> Mongo
|
||||
WorkerPool --> Blob
|
||||
API --> Mongo
|
||||
API --> Blob
|
||||
API --> Authority
|
||||
Orchestrator --> Mongo
|
||||
Authority --> API
|
||||
```
|
||||
|
||||
Key notes:
|
||||
|
||||
- API host exposes lifecycle, run, simulate, findings endpoints with DPoP-bound OAuth enforcement.
|
||||
- Orchestrator manages run scheduling/fairness; writes run tickets to queue, leases jobs to worker pool.
|
||||
- Workers evaluate policies using cached IR; join external services via tenant-scoped clients; pull immutable advisories/VEX from the raw stores; write derived overlays to Mongo and optional explain bundles to blob storage.
|
||||
- Observability (metrics/traces/logs) integrated via OpenTelemetry (not shown).
|
||||
|
||||
---
|
||||
|
||||
### 2.1 · AOC inputs & immutability
|
||||
|
||||
- **Raw-only reads.** Evaluation workers access `advisory_raw` / `vex_raw` via tenant-scoped Mongo clients or the Concelier/Excititor raw APIs. No Policy Engine component is permitted to mutate these collections.
|
||||
- **Guarded ingestion.** `AOCWriteGuard` rejects forbidden fields before data reaches the raw stores. Policy tests replay known `ERR_AOC_00x` violations to confirm ingestion compliance.
|
||||
- **Change streams as contract.** Run orchestration stores resumable cursors for raw change streams. Replays of these cursors (e.g., after failover) must yield identical materialisation outcomes.
|
||||
- **Derived stores only.** All severity, consensus, and suppression state lives in `effective_finding_*` collections and explain bundles owned by Policy Engine. Provenance fields link back to raw document IDs so auditors can trace every verdict.
|
||||
- **Authority scopes.** Only the Policy Engine service identity holds `effective:write`. Ingestion identities retain `advisory:*`/`vex:*` scopes, ensuring separation of duties enforced by Authority and the API Gateway.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Module Breakdown
|
||||
|
||||
| Module | Responsibility | Notes |
|
||||
|--------|----------------|-------|
|
||||
| **Configuration** (`Configuration/`) | Bind settings (Mongo URIs, queue options, service URLs, sealed mode), validate on start. | Strict schema; fails fast on missing secrets. |
|
||||
| **Authority Client** (`Authority/`) | Acquire tokens, enforce scopes, perform DPoP key rotation. | Only service identity uses `effective:write`. |
|
||||
| **DSL Compiler** (`Dsl/`) | Parse, canonicalise, IR generation, checksum caching. | Uses Roslyn-like pipeline; caches by `policyId+version+hash`. |
|
||||
| **Selection Layer** (`Selection/`) | Batch SBOM ↔ advisory ↔ VEX joiners; apply equivalence tables; support incremental cursors. | Deterministic ordering (SBOM → advisory → VEX). |
|
||||
| **Evaluator** (`Evaluation/`) | Execute IR with first-match semantics, compute severity/trust/reachability weights, record rule hits. | Stateless; all inputs provided by selection layer. |
|
||||
| **Materialiser** (`Materialization/`) | Upsert effective findings, append history, manage explain bundle exports. | Mongo transactions per SBOM chunk. |
|
||||
| **Orchestrator** (`Runs/`) | Change-stream ingestion, fairness, retry/backoff, queue writer. | Works with Scheduler Models DTOs. |
|
||||
| **API** (`Api/`) | Minimal API endpoints, DTO validation, problem responses, idempotency. | Generated clients for CLI/UI. |
|
||||
| **Observability** (`Telemetry/`) | Metrics (`policy_run_seconds`, `rules_fired_total`), traces, structured logs. | Sampled rule-hit logs with redaction. |
|
||||
| **Offline Adapter** (`Offline/`) | Bundle export/import (policies, simulations, runs), sealed-mode enforcement. | Uses DSSE signing via Signer service. |
|
||||
|
||||
---
|
||||
|
||||
## 4 · Data Model & Persistence
|
||||
|
||||
### 4.1 Collections
|
||||
|
||||
- `policies` – policy versions, metadata, lifecycle states, simulation artefact references.
|
||||
- `policy_runs` – run records, inputs (cursors, env), stats, determinism hash, run status.
|
||||
- `policy_run_events` – append-only log (queued, leased, completed, failed, canceled, replay).
|
||||
- `effective_finding_{policyId}` – current verdict snapshot per finding.
|
||||
- `effective_finding_{policyId}_history` – append-only history (previous verdicts, timestamps, runId).
|
||||
- `policy_reviews` – review comments/decisions.
|
||||
|
||||
### 4.2 Schema Highlights
|
||||
|
||||
- Run records include `changeDigests` (hash of advisory/VEX inputs) for replay verification.
|
||||
- Effective findings store provenance references (`advisory_raw_ids`, `vex_raw_ids`, `sbom_component_id`).
|
||||
- All collections include `tenant`, `policyId`, `version`, `createdAt`, `updatedAt`, `traceId` for audit.
|
||||
|
||||
### 4.3 Indexing
|
||||
|
||||
- Compound indexes: `{tenant, policyId, status}` on `policies`; `{tenant, policyId, status, startedAt}` on `policy_runs`; `{policyId, sbomId, findingKey}` on findings.
|
||||
- TTL indexes on transient explain bundle references (configurable).
|
||||
|
||||
---
|
||||
|
||||
## 5 · Evaluation Pipeline
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant Worker as EvaluationWorker
|
||||
participant Compiler as CompilerCache
|
||||
participant Selector as SelectionLayer
|
||||
participant Eval as Evaluator
|
||||
participant Mat as Materialiser
|
||||
participant Expl as ExplainStore
|
||||
|
||||
Worker->>Compiler: Load IR (policyId, version, digest)
|
||||
Compiler-->>Worker: CompiledPolicy (cached or compiled)
|
||||
Worker->>Selector: Fetch tuple batches (sbom, advisory, vex)
|
||||
Selector-->>Worker: Deterministic batches (1024 tuples)
|
||||
loop For each batch
|
||||
Worker->>Eval: Execute rules (batch, env)
|
||||
Eval-->>Worker: Verdicts + rule hits
|
||||
Worker->>Mat: Upsert effective findings
|
||||
Mat-->>Worker: Success
|
||||
Worker->>Expl: Persist sampled explain traces (optional)
|
||||
end
|
||||
Worker->>Mat: Append history + run stats
|
||||
Worker-->>Worker: Compute determinism hash
|
||||
Worker->>+Mat: Finalize transaction
|
||||
Mat-->>Worker: Ack
|
||||
```
|
||||
|
||||
Determinism guard instrumentation wraps the evaluator, rejecting access to forbidden APIs and ensuring batch ordering remains stable.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Run Orchestration & Incremental Flow
|
||||
|
||||
- **Change streams:** Concelier and Excititor publish document changes to the scheduler queue (`policy.trigger.delta`). Payload includes `tenant`, `source`, `linkset digests`, `cursor`.
|
||||
- **Orchestrator:** Maintains per-tenant backlog; merges deltas until time/size thresholds met, then enqueues `PolicyRunRequest`.
|
||||
- **Queue:** Mongo queue with lease; each job assigned `leaseDuration`, `maxAttempts`.
|
||||
- **Workers:** Lease jobs, execute evaluation pipeline, report status (success/failure/canceled). Failures with recoverable errors requeue with backoff; determinism or schema violations mark job `failed` and raise incident event.
|
||||
- **Fairness:** Round-robin per `{tenant, policyId}`; emergency jobs (`priority=emergency`) jump queue but limited via circuit breaker.
|
||||
- **Replay:** On demand, orchestrator rehydrates run via stored cursors and exports sealed bundle for audit/CI determinism checks.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Security & Tenancy
|
||||
|
||||
- **Auth:** All API calls pass through Authority gateway; DPoP tokens enforced for service-to-service (Policy Engine service principal). CLI/UI tokens include scope claims.
|
||||
- **Scopes:** Mutations require `policy:*` scopes corresponding to action; `effective:write` restricted to service identity.
|
||||
- **Tenancy:** All queries filter by `tenant`. Service identity uses `tenant-global` for shared policies; cross-tenant reads prohibited unless `policy:tenant-admin` scope present.
|
||||
- **Secrets:** Configuration loaded via environment variables or sealed secrets; runtime avoids writing secrets to logs.
|
||||
- **Determinism guard:** Static analyzer prevents referencing forbidden namespaces; runtime guard intercepts `DateTime.Now`, `Random`, `Guid`, HTTP clients beyond allow-list.
|
||||
- **Sealed mode:** Global flag disables outbound network except allow-listed internal hosts; watchers fail fast if unexpected egress attempted.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Observability
|
||||
|
||||
- Metrics:
|
||||
- `policy_run_seconds{mode,tenant,policy}` (histogram)
|
||||
- `policy_run_queue_depth{tenant}`
|
||||
- `policy_rules_fired_total{policy,rule}`
|
||||
- `policy_vex_overrides_total{policy,vendor}`
|
||||
- Logs: Structured JSON with `traceId`, `policyId`, `version`, `runId`, `tenant`, `phase`. Guard ensures no sensitive data leakage.
|
||||
- Traces: Spans `policy.select`, `policy.evaluate`, `policy.materialize`, `policy.simulate`. Trace IDs surfaced to CLI/UI.
|
||||
- Incident mode toggles 100 % sampling and extended retention windows.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Offline / Bundle Integration
|
||||
|
||||
- **Imports:** Offline Kit delivers policy packs, advisory/VEX snapshots, SBOM updates. Policy Engine ingests bundles via `offline import`.
|
||||
- **Exports:** `stella policy bundle export` packages policy, IR digest, simulations, run metadata; UI provides export triggers.
|
||||
- **Sealed hints:** Explain traces annotate when cached values used (EPSS, KEV). Run records mark `env.sealed=true`.
|
||||
- **Sync cadence:** Operators perform monthly bundle sync; Policy Engine warns when snapshots > configured staleness (default 14 days).
|
||||
|
||||
---
|
||||
|
||||
## 10 · Testing & Quality
|
||||
|
||||
- **Unit tests:** DSL parsing, evaluator semantics, guard enforcement.
|
||||
- **Integration tests:** Joiners with sample SBOM/advisory/VEX data; materialisation with deterministic ordering; API contract tests generated from OpenAPI.
|
||||
- **Property tests:** Ensure rule evaluation deterministic across permutations.
|
||||
- **Golden tests:** Replay recorded runs, compare determinism hash.
|
||||
- **Performance tests:** Evaluate 100k component / 1M advisory dataset under warmed caches (<30 s full run).
|
||||
- **Chaos hooks:** Optional toggles to simulate upstream latency/failures; used in staging.
|
||||
|
||||
---
|
||||
|
||||
## 11 · Compliance Checklist
|
||||
|
||||
- [ ] **Determinism guard enforced:** Static analyzer + runtime guard block wall-clock, RNG, unauthorized network calls.
|
||||
- [ ] **Incremental correctness:** Change-stream cursors stored and replayed during tests; unit/integration coverage for dedupe.
|
||||
- [ ] **RBAC validated:** Endpoint scope requirements match Authority configuration; integration tests cover deny/allow.
|
||||
- [ ] **AOC separation enforced:** No code path writes to `advisory_raw` / `vex_raw`; integration tests capture `ERR_AOC_00x` handling; read-only clients verified.
|
||||
- [ ] **Effective findings ownership:** Only Policy Engine identity holds `effective:write`; unauthorized callers receive `ERR_AOC_006`.
|
||||
- [ ] **Observability wired:** Metrics/traces/logs exported with correlation IDs; dashboards include `aoc_violation_total` and ingest latency panels.
|
||||
- [ ] **Offline parity:** Sealed-mode tests executed; bundle import/export flows documented and validated.
|
||||
- [ ] **Schema docs synced:** DTOs match Scheduler Models (`SCHED-MODELS-20-001`); JSON schemas committed.
|
||||
- [ ] **Security reviews complete:** Threat model (including queue poisoning, determinism bypass, data exfiltration) documented; mitigations in place.
|
||||
- [ ] **Disaster recovery rehearsed:** Run replay+rollback procedures tested and recorded.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
Reference in New Issue
Block a user