feat: Implement Scheduler Worker Options and Planner Loop

- Added `SchedulerWorkerOptions` class to encapsulate configuration for the scheduler worker. - Introduced `PlannerBackgroundService` to manage the planner loop, fetching and processing planning runs. - Created `PlannerExecutionService` to handle the execution logic for planning runs, including impact targeting and run persistence. - Developed `PlannerExecutionResult` and `PlannerExecutionStatus` to standardize execution outcomes. - Implemented validation logic within `SchedulerWorkerOptions` to ensure proper configuration. - Added documentation for the planner loop and impact targeting features. - Established health check endpoints and authentication mechanisms for the Signals service. - Created unit tests for the Signals API to ensure proper functionality and response handling. - Configured options for authority integration and fallback authentication methods.
2025-10-27 09:46:31 +02:00
parent 96d52884e8
commit 730354a1af
135 changed files with 10721 additions and 946 deletions
--- a/docs/security/authority-scopes.md
+++ b/docs/security/authority-scopes.md
@@ -11,28 +11,29 @@ Authority issues short-lived tokens bound to tenants and scopes. Sprint 19 int

 | Scope | Surface | Purpose | Notes |
 |-------|---------|---------|-------|
-| `advisory:write` | Concelier ingestion APIs | Allows append-only writes to `advisory_raw`. | Granted to Concelier WebService and trusted connectors. Requires tenant claim. |
-| `advisory:verify` | Concelier `/aoc/verify`, CLI, UI dashboard | Permits guard verification and access to violation summaries. | Read-only; used by `stella aoc verify` and console dashboard. |
-| `vex:write` | Excititor ingestion APIs | Append-only writes to `vex_raw`. | Mirrors `advisory:write`. |
-| `vex:verify` | Excititor `/aoc/verify`, CLI | Read-only verification of VEX ingestion. | Optional for environments without VEX feeds. |
-| `graph:write` | Cartographer build pipeline | Enqueue graph build/overlay jobs. | Reserved for the Cartographer service identity; requires tenant claim. |
-| `graph:read` | Graph API, Scheduler overlays, UI | Read graph projections/overlays. | Requires tenant claim; granted to Cartographer, Graph API, Scheduler. |
+| `advisory:ingest` | Concelier ingestion APIs | Append-only writes to `advisory_raw` collections. | Requires tenant claim; blocked for global clients. |
+| `advisory:read` | `/aoc/verify`, Concelier dashboards, CLI | Read-only access to stored advisories and guard results. | Needed alongside `aoc:verify` for CLI/console verification. |
+| `vex:ingest` | Excititor ingestion APIs | Append-only writes to `vex_raw`. | Mirrors `advisory:ingest`; tenant required. |
+| `vex:read` | `/aoc/verify`, Excititor dashboards, CLI | Read-only access to stored VEX material. | Pair with `aoc:verify` for guard checks. |
+| `aoc:verify` | CLI/CI pipelines, Console verification jobs | Execute Aggregation-Only Contract guard runs. | Always issued with tenant; read-only combined with `advisory:read`/`vex:read`. |
+| `graph:write` | Cartographer pipeline | Enqueue graph build/overlay jobs. | Reserved for Cartographer service identity; tenant required. |
+| `graph:read` | Graph API, Scheduler overlays, UI | Read graph projections/overlays. | Tenant required; granted to Cartographer, Graph API, Scheduler. |
 | `graph:export` | Graph export endpoints | Stream GraphML/JSONL artefacts. | UI/gateway automation only; tenant required. |
 | `graph:simulate` | Policy simulation overlays | Trigger what-if overlays on graphs. | Restricted to automation; tenant required. |
-| `effective:write` | Policy Engine | Allows creation/update of `effective_finding_*` collections. | **Only** the Policy Engine service client may hold this scope. |
-| `effective:read` | Console, CLI, exports | Read derived findings. | Shared across tenants with role-based restrictions. |
-| `aoc:dashboard` | Console UI | Access AOC dashboard resources. | Bundles `advisory:verify`/`vex:verify` by default; keep for UI RBAC group mapping. |
-| `aoc:verify` | Automation service accounts | Execute verification via API without the full dashboard role. | For CI pipelines, offline kit validators. |
-| Existing scopes | (e.g., `policy:*`, `sbom:*`) | Unchanged. | Review `/docs/security/policy-governance.md` for policy-specific scopes. |
+| `effective:write` | Policy Engine | Create/update `effective_finding_*` collections. | **Only** the Policy Engine service client may hold this scope; tenant required. |
+| `findings:read` | Console, CLI, exports | Read derived findings materialised by Policy Engine. | Shared across tenants with RBAC; tenant claim still enforced. |
+| `vuln:read` | Vuln Explorer API/UI | Read normalized vulnerability data. | Tenant required. |
+| Existing scopes | (e.g., `policy:*`, `concelier.jobs.trigger`) | Unchanged. | Review `/docs/security/policy-governance.md` for policy-specific scopes. |

 ### 1.1 Scope bundles (roles)

- **`role/concelier-ingest`** → `advisory:write`, `advisory:verify`.
- **`role/excititor-ingest`** → `vex:write`, `vex:verify`.
- **`role/aoc-operator`** → `aoc:dashboard`, `aoc:verify`, `advisory:verify`, `vex:verify`.
- **`role/policy-engine`** → `effective:write`, `effective:read`.
+- **`role/concelier-ingest`** → `advisory:ingest`, `advisory:read`.
+- **`role/excititor-ingest`** → `vex:ingest`, `vex:read`.
+- **`role/aoc-operator`** → `aoc:verify`, `advisory:read`, `vex:read`.
+- **`role/policy-engine`** → `effective:write`, `findings:read`.
 - **`role/cartographer-service`** → `graph:write`, `graph:read`.
 - **`role/graph-gateway`** → `graph:read`, `graph:export`, `graph:simulate`.
+- **`role/console`** → `advisory:read`, `vex:read`, `aoc:verify`, `findings:read`, `vuln:read`.

 Roles are declared per tenant in `authority.yaml`:

@@ -41,11 +42,11 @@ tenants:
  - name: default
    roles:
      concelier-ingest:
-        scopes: [advisory:write, advisory:verify]
+        scopes: [advisory:ingest, advisory:read]
      aoc-operator:
-        scopes: [aoc:dashboard, aoc:verify, advisory:verify, vex:verify]
+        scopes: [aoc:verify, advisory:read, vex:read]
      policy-engine:
-        scopes: [effective:write, effective:read]
+        scopes: [effective:write, findings:read]
 ```

 ---
@@ -62,10 +63,10 @@ Tokens now include:

 Authority rejects requests when:

- `tenant` is missing while requesting `advisory:*`, `vex:*`, or `aoc:*` scopes.
+- `tenant` is missing while requesting `advisory:ingest`, `advisory:read`, `vex:ingest`, `vex:read`, or `aoc:verify` scopes.
 - `service_identity != policy-engine` but `effective:write` is present (`ERR_AOC_006` enforcement).
 - `service_identity != cartographer` but `graph:write` is present (graph pipeline enforcement).
- Tokens attempt to combine `advisory:write` with `effective:write` (separation of duties).
+- Tokens attempt to combine `advisory:ingest` with `effective:write` (separation of duties).

 ### 2.2 Propagation

@@ -90,22 +91,30 @@ Add new scopes and optional claims transformations:
 ```yaml
 security:
  scopes:
-    - name: advisory:write
-      description: Concelier raw ingestion
-    - name: advisory:verify
-      description: Verify Concelier ingestion
-    - name: vex:write
+    - name: advisory:ingest
+      description: Concelier raw ingestion (append-only)
+    - name: advisory:read
+      description: Read Concelier advisories and guard verdicts
+    - name: vex:ingest
      description: Excititor raw ingestion
-    - name: vex:verify
-      description: Verify Excititor ingestion
-    - name: aoc:dashboard
-      description: Access AOC UI dashboards
+    - name: vex:read
+      description: Read Excititor VEX records
    - name: aoc:verify
      description: Run AOC verification
    - name: effective:write
      description: Policy Engine materialisation
-    - name: effective:read
+    - name: findings:read
      description: Read derived findings
+    - name: graph:write
+      description: Cartographer build submissions
+    - name: graph:read
+      description: Read graph overlays
+    - name: graph:export
+      description: Export graph artefacts
+    - name: graph:simulate
+      description: Run graph what-if simulations
+    - name: vuln:read
+      description: Read Vuln Explorer data
  claimTransforms:
    - match: { scope: "effective:write" }
      require:
@@ -119,13 +128,13 @@ security:

 Update service clients:

- `Concelier.WebService` → request `advisory:write`, `advisory:verify`.
- `Excititor.WebService` → request `vex:write`, `vex:verify`.
- `Policy.Engine` → request `effective:write`, `effective:read`; set `properties.serviceIdentity=policy-engine`.
+- `Concelier.WebService` → request `advisory:ingest`, `advisory:read`.
+- `Excititor.WebService` → request `vex:ingest`, `vex:read`.
+- `Policy.Engine` → request `effective:write`, `findings:read`; set `properties.serviceIdentity=policy-engine`.
 - `Cartographer.Service` → request `graph:write`, `graph:read`; set `properties.serviceIdentity=cartographer`.
 - `Graph API Gateway` → request `graph:read`, `graph:export`, `graph:simulate`; tenant hint required.
- `Console` → request `aoc:dashboard`, `effective:read` plus existing UI scopes.
- `CLI automation` → request `aoc:verify`, `advisory:verify`, `vex:verify` as needed.
+- `Console` → request `advisory:read`, `vex:read`, `aoc:verify`, `findings:read`, `vuln:read` plus existing UI scopes.
+- `CLI automation` → request `aoc:verify`, `advisory:read`, `vex:read` as needed.

 Client definition snippet:

@@ -133,11 +142,11 @@ Client definition snippet:
 clients:
  - clientId: concelier-web
    grantTypes: [client_credentials]
-    scopes: [advisory:write, advisory:verify]
+    scopes: [advisory:ingest, advisory:read]
    tenants: [default]
  - clientId: policy-engine
    grantTypes: [client_credentials]
-    scopes: [effective:write, effective:read]
+    scopes: [effective:write, findings:read]
    properties:
      serviceIdentity: policy-engine
  - clientId: cartographer-service
@@ -152,7 +161,7 @@ clients:
 ## 4 · Operational safeguards

 - **Audit events:** Authority emits `authority.scope.granted` and `authority.scope.revoked` events with `scope` and `tenant`. Monitor for unexpected grants.
- **Rate limiting:** Apply stricter limits on `/token` endpoints for clients requesting `advisory:write` or `vex:write` to mitigate brute-force ingestion attempts.
+- **Rate limiting:** Apply stricter limits on `/token` endpoints for clients requesting `advisory:ingest` or `vex:ingest` to mitigate brute-force ingestion attempts.
 - **Incident response:** Link AOC alerts to Authority audit logs to confirm whether violations come from expected identities.
 - **Rotation:** Rotate ingest client secrets alongside guard deployments; add rotation steps to `ops/authority-key-rotation.md`.
 - **Testing:** Integration tests must fail if tokens lacking `tenant` attempt ingestion; add coverage in Concelier/Excititor smoke suites (see `CONCELIER-CORE-AOC-19-013`).
@@ -161,7 +170,7 @@ clients:

 ## 5 · Offline & air-gap notes

- Offline Kit bundles include tenant-scoped service credentials. Ensure ingest bundles ship without `advisory:write` scopes unless strictly required.
+- Offline Kit bundles include tenant-scoped service credentials. Ensure ingest bundles ship without `advisory:ingest` scopes unless strictly required.
 - CLI verification in offline environments uses pre-issued `aoc:verify` tokens; document expiration and renewal processes.
 - Authority replicas in air-gapped environments should restrict scope issuance to known tenants and log all `/token` interactions for later replay.

@@ -191,4 +200,4 @@ clients:

 ---

-*Last updated: 2025-10-26 (Sprint 19).* 
+*Last updated: 2025-10-27 (Sprint 19).* 
--- a/docs/security/console-security.md
+++ b/docs/security/console-security.md
@@ -0,0 +1,162 @@
+# StellaOps Console Security Posture
+
+> **Audience:** Security Guild, Console & Authority teams, deployment engineers.  
+> **Scope:** OIDC/DPoP flows, scope model, session controls, CSP and transport headers, evidence handling, offline posture, and monitoring expectations for the StellaOps Console (Sprint 23).
+
+The console is an Angular SPA fronted by the StellaOps Web gateway. It consumes Authority for identity, Concelier/Excititor for aggregation data, Policy Engine for findings, and Attestor for evidence bundles. This guide captures the security guarantees and required hardening so that the console can ship alongside the Aggregation-Only Contract (AOC) without introducing new attack surface.
+
+---
+
+## 1 · Identity & Authentication
+
+### 1.1 Authorization sequence
+
+1. Browser→Authority uses **OAuth 2.1 Authorization Code + PKCE** (`S256`).  
+2. Upon code exchange the console requests a **DPoP-bound access token** (`aud=console`, `tenant=<id>`) with **120 s TTL** and optional **rotating refresh token** (`rotate=true`).  
+3. Authority includes `cnf.jkt` for the ephemeral WebCrypto keypair; console stores the private key in **IndexedDB** (non-exportable) and keeps the public JWK in memory.  
+4. All API calls attach `Authorization: Bearer <token>` + `DPoP` proof header. Nonces from the gateway are replay-protected (`dpopt-nonce` header).  
+5. Tenanted API calls flow through the Web gateway which forwards `X-Stella-Tenant` and enforces tenancy headers. Missing or mismatched tenants trigger `403` with `ERR_TENANT_MISMATCH`.
+
+### 1.2 Fresh-auth gating
+
+- Sensitive actions (tenant edits, token revocation, policy promote, signing key rotation) call `Authority /fresh-auth` using `prompt=login` + `max_age=300`.  
+- Successful fresh-auth yields a **300 s** scoped token (`fresh_auth=true`) stored only in memory; the UI disables guarded buttons when the timer expires.  
+- Audit events: `authority.fresh_auth.start`, `authority.fresh_auth.success`, `authority.fresh_auth.expired` (link to correlation IDs for the gated action).
+
+### 1.3 Offline & sealed mode
+
+- When `console.offlineMode=true` the console presents an offline banner and suppresses fresh-auth prompts, replacing them with CLI guidance (`stella auth fresh-auth --offline`).  
+- Offline mode requires pre-issued tenant-scoped tokens bundled with the Offline Kit; tokens must include `offline=true` claim and 15 m TTL.  
+- Authority availability health is polled via `/api/console/status`. HTTP failures raise the offline banner and switch to read-only behaviour.
+
+---
+
+## 2 · Session & Device Binding
+
+- Access and refresh tokens live in memory; metadata (subject, tenant, expiry) persists in `sessionStorage` for reload continuity. **Never** store raw JWTs in `localStorage`.  
+- Inactivity timeout defaults to **15 minutes**. Idle sessions trigger silent refresh; on failure the UI shows a modal requiring re-auth.  
+- Tokens are device-bound through DPoP; if a new device logs in, Authority revokes the previous DPoP key and emits `authority.token.binding_changed`.  
+- CSRF mitigations: bearer tokens plus DPoP remove cookie reliance. If cookies are required (e.g., same-origin analytics) they must be `HttpOnly`, `SameSite=Lax`, `Secure`.  
+- Browser hardening: enforce `Strict-Transport-Security`, `X-Content-Type-Options: nosniff`, `Referrer-Policy: no-referrer`, `Permissions-Policy: camera=(), microphone=(), geolocation=()`.
+
+---
+
+## 3 · Authorization & Scope Model
+
+The console client is registered in Authority as `console-ui` with scopes:
+
+| Feature area | Required scopes | Notes |
+|--------------|----------------|-------|
+| Base navigation (Dashboard, Findings, SBOM, Runs) | `ui.read`, `findings:read`, `advisory:read`, `vex:read`, `aoc:verify` | `findings:read` enables Policy Engine overlays; `advisory:read`/`vex:read` load ingestion panes; `aoc:verify` allows on-demand guard runs. |
+| Admin workspace | `ui.admin`, `authority:tenants.read`, `authority:tenants.write`, `authority:roles.read`, `authority:roles.write`, `authority:tokens.read`, `authority:tokens.revoke`, `authority:clients.read`, `authority:clients.write`, `authority:audit.read` | Scope combinations are tenant constrained. Role changes require fresh-auth. |
+| Policy approvals | `policy:read`, `policy:review`, `policy:approve`, `policy:activate`, `policy:runs` | `policy:activate` gated behind fresh-auth. |
+| Observability panes (status ticker, telemetry) | `ui.telemetry`, `scheduler:runs.read`, `advisory:read`, `vex:read` | `ui.telemetry` drives OTLP export toggles. |
+| Downloads parity (SBOM, attestation) | `downloads:read`, `attestation:verify`, `sbom:export` | Console surfaces digests only; download links require CLI parity for write operations. |
+
+Guidance:
+
+- **Role mapping**: Provision Authority role `role/ui-console-admin` encapsulating the admin scopes above.  
+- **Tenant enforcement**: Gateway injects `X-Stella-Tenant` from token claims. Requests missing the header must be rejected by downstream services (Concelier, Excititor, Policy Engine) and logged.  
+- **Separation of duties**: Never grant `ui.admin` and `policy:approve` to the same human role without SOC sign-off; automation accounts should use least-privilege dedicated clients.
+
+---
+
+## 4 · Transport, CSP & Browser Hardening
+
+### 4.1 Gateway requirements
+
+- TLS 1.2+ with modern cipher suites; enable HTTP/2 for SSE streams.  
+- Terminate TLS at the reverse proxy (Traefik, NGINX) and forward `X-Forwarded-*` headers (`ASPNETCORE_FORWARDEDHEADERS_ENABLED=true`).  
+- Rate-limit `/authorize` and `/token` according to [Authority rate-limit guidance](rate-limits.md).
+
+### 4.2 Content Security Policy
+
+Default CSP served by the console container:
+
+```
+default-src 'self';
+connect-src 'self' https://*.stella-ops.local;
+img-src 'self' data:;
+script-src 'self';
+style-src 'self' 'unsafe-inline';
+font-src 'self';
+frame-ancestors 'none';
+```
+
+Recommendations:
+
+- Extend `connect-src` only for known internal APIs (e.g., telemetry collector). Use `console.config.cspOverrides` instead of editing NGINX directly.  
+- Enable **COOP/COEP** (`Cross-Origin-Opener-Policy: same-origin`, `Cross-Origin-Embedder-Policy: require-corp`) to support WASM policy previews.  
+- Use **Subresource Integrity (SRI)** hashes when adding third-party fonts or scripts.  
+- For embedded screenshots/GIFs sourced from Offline Kit, use `img-src 'self' data: blob:` and verify assets during build.  
+- Enforce `X-Frame-Options: DENY`, `X-XSS-Protection: 0`, and `Cache-Control: no-store` on JSON API responses (HTML assets remain cacheable).
+
+### 4.3 SSE & WebSocket hygiene
+
+- SSE endpoints (`/console/status/stream`, `/console/runs/{id}/events`) must set `Cache-Control: no-store` and disable proxy buffering.  
+- Gate SSE behind the same DPoP tokens; reject without `Authorization`.  
+- Proxy timeouts ≥ 60 s to avoid disconnect storms; clients use exponential backoff with jitter.
+
+---
+
+## 5 · Evidence & Data Handling
+
+- **Evidence bundles**: Download links trigger `attestor.verify` or `downloads.manifest` APIs. The UI never caches bundle contents; it only surfaces SHA-256 digests and cosign signatures. Operators must use CLI to fetch the signed artefact.  
+- **Secrets**: UI redacts tokens, emails, and attachment paths in logs. Structured logs include only `subject`, `tenant`, `action`, `correlationId`.  
+- **Aggregation data**: Console honours Aggregation-Only contract—no client-side rewriting of Concelier/Excititor precedence. Provenance badges display source IDs and merge-event hashes.  
+- **PII minimisation**: User lists show minimal identity (display name, email hash). Full email addresses require `ui.admin` + fresh-auth.  
+- **Downloads parity**: Every downloadable artefact includes a CLI parity link (e.g., `stella downloads fetch --artifact <id>`). If CLI parity fails, the console displays a warning banner and links to troubleshooting docs.
+
+---
+
+## 6 · Logging, Monitoring & Alerts
+
+- Structured logs: `ui.action`, `tenantId`, `subject`, `scope`, `correlationId`, `dpop.jkt`. Log level `Information` for key actions; `Warning` for security anomalies (failed DPoP, tenant mismatch).  
+- Metrics (Prometheus): `ui_request_duration_seconds`, `ui_dpop_failure_total`, `ui_fresh_auth_prompt_total`, `ui_tenant_switch_total`, `ui_offline_banner_seconds`.  
+- Alerts:
+  1. **Fresh-auth failures** > 5 per minute per tenant → security review.  
+  2. **DPoP mismatches** sustained > 1 % of requests → potential replay attempt.  
+  3. **Tenant mismatches** > 0 triggers an audit incident (could indicate scope misconfiguration).  
+- Correlate with Authority audit events (`authority.scope.granted`, `authority.token.revoked`) and Concelier/Excititor ingestion logs to trace user impact.
+
+---
+
+## 7 · Offline & Air-Gapped Posture
+
+- Offline deployments require mirrored container images and Offline Kit manifest verification (see `/docs/deploy/console.md` §7).  
+- Console reads `offlineManifest.json` at boot to validate asset digests; mismatches block startup until the manifest is refreshed.  
+- Tenant and role edits queue change manifests for export; UI instructs operators to run `stella auth apply --bundle <file>` on the offline Authority host.  
+– Evidence viewing remains read-only; download buttons provide scripts to export from local Attestor snapshots.  
+- Fresh-auth prompts display instructions for hardware-token usage on bastion hosts; system logs mark actions executed under offline fallback.
+
+---
+
+## 8 · Threat Model Alignment
+
+| Threat (Authority TM §5) | Console control |
+|--------------------------|-----------------|
+| Spoofed revocation bundle | Console verifies manifest signatures before showing revocation status; links to `stella auth revoke verify`. |
+| Parameter tampering on `/token` | PKCE + DPoP enforced; console propagates correlation IDs so Authority logs can link anomalies. |
+| Bootstrap invite replay | Admin UI surfaces invite status with expiry; fresh-auth required before issuing new invites. |
+| Token replay by stolen agent | DPoP binding prevents reuse; console surfaces revocation latency warnings sourced from Zastava metrics. |
+| Offline bundle tampering | Console refuses unsigned Offline Kit assets; prompts operators to re-import verified bundles. |
+| Privilege escalation via plug-in overrides | Plug-in manifest viewer warns when a plug-in downgrades password policy; UI restricts plug-in activation to fresh-auth + `ui.admin` scoped users. |
+
+Document gaps and remediation hooks in `SEC5.*` backlog as they are addressed.
+
+---
+
+## 9 · Compliance checklist
+
+- [ ] Authority client `console-ui` registered with PKCE, DPoP, tenant claim requirement, and scopes from §3.  
+- [ ] CSP enforced per §4 with overrides documented in deployment manifests.  
+- [ ] Fresh-auth timer (300 s) validated for admin and policy actions; audit events captured.  
+- [ ] DPoP binding tested (replay attempt blocked; logs show `ui_dpop_failure_total` increment).  
+- [ ] Offline mode exercises performed (banner, CLI guidance, manifest verification).  
+- [ ] Evidence download parity verified with CLI scripts; console never caches sensitive artefacts.  
+- [ ] Monitoring dashboards show metrics and alerts outlined in §6; alert runbooks reviewed with Security Guild.  
+- [ ] Security review sign-off recorded in sprint log with links to Authority threat model references.
+
+---
+
+*Last updated: 2025-10-28 (Sprint 23).*