Add Policy DSL Validator, Schema Exporter, and Simulation Smoke tools
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Implemented PolicyDslValidator with command-line options for strict mode and JSON output. - Created PolicySchemaExporter to generate JSON schemas for policy-related models. - Developed PolicySimulationSmoke tool to validate policy simulations against expected outcomes. - Added project files and necessary dependencies for each tool. - Ensured proper error handling and usage instructions across tools.
This commit is contained in:
@@ -629,6 +629,32 @@ See `docs/dev/32_AUTH_CLIENT_GUIDE.md` for recommended profiles (online vs. air-
|
||||
| `stellaops-cli config show` | Display resolved configuration | — | Masks secret values; helpful for air‑gapped installs |
|
||||
| `stellaops-cli runtime policy test` | Ask Scanner.WebService for runtime verdicts (Webhook parity) | `--image/-i <digest>` (repeatable, comma/space lists supported)<br>`--file/-f <path>`<br>`--namespace/--ns <name>`<br>`--label/-l key=value` (repeatable)<br>`--json` | Posts to `POST /api/v1/scanner/policy/runtime`, deduplicates image digests, and prints TTL/policy revision plus per-image columns for signed state, SBOM referrers, quieted-by metadata, confidence, Rekor attestation (uuid + verified flag), and recently observed build IDs (shortened for readability). Accepts newline/whitespace-delimited stdin when piped; `--json` emits the raw response without additional logging. |
|
||||
|
||||
#### Example: Pivot from runtime verdicts to debug symbols
|
||||
|
||||
```bash
|
||||
$ stellaops-cli runtime policy test \
|
||||
--image ghcr.io/acme/payments@sha256:4f7d55f6... \
|
||||
--namespace payments
|
||||
|
||||
Image Digest Signed SBOM Build IDs TTL
|
||||
ghcr.io/acme/payments@sha256:4f7d55f6... yes present 5f0c7c3c..., 1122aabbccddeeff... 04:59:55
|
||||
```
|
||||
|
||||
1. Copy one of the hashes (e.g. `5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789`) and locate the bundled debug artefact:
|
||||
```bash
|
||||
ls offline-kit/debug/.build-id/5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug
|
||||
```
|
||||
2. Confirm the running binary advertises the same GNU build-id:
|
||||
```bash
|
||||
readelf -n /proc/$(pgrep -f payments-api | head -n1)/exe | grep -i 'Build ID'
|
||||
```
|
||||
3. If you operate a debuginfod mirror backed by the Offline Kit tree, resolve symbols with:
|
||||
```bash
|
||||
debuginfod-find debuginfo 5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789 >/tmp/payments-api.debug
|
||||
```
|
||||
|
||||
See [Offline Kit step 0](24_OFFLINE_KIT.md#0-prepare-the-debug-store) for instructions on mirroring the debug store before packaging.
|
||||
|
||||
`POST /api/v1/scanner/policy/runtime` responds with one entry per digest. Each result now includes:
|
||||
|
||||
- `policyVerdict` (`pass|warn|fail|error`), `signed`, and `hasSbomReferrers` parity with the webhook contract.
|
||||
@@ -739,7 +765,7 @@ For offline workflows, configure `StellaOps:Offline:KitsDirectory` (or `STELLAOP
|
||||
"ClientSecret": "REDACTED",
|
||||
"Username": "",
|
||||
"Password": "",
|
||||
"Scope": "concelier.jobs.trigger",
|
||||
"Scope": "concelier.jobs.trigger advisory:ingest advisory:read",
|
||||
"TokenCacheDirectory": ""
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,61 +1,61 @@
|
||||
# 10 · Concelier + CLI Quickstart
|
||||
|
||||
This guide walks through configuring the Concelier web service and the `stellaops-cli`
|
||||
tool so an operator can ingest advisories, merge them, and publish exports from a
|
||||
single workstation. It focuses on deployment-facing surfaces only (configuration,
|
||||
runtime wiring, CLI usage) and leaves connector/internal customization for later.
|
||||
|
||||
---
|
||||
|
||||
## 0 · Prerequisites
|
||||
|
||||
- .NET SDK **10.0.100-preview** (matches `global.json`)
|
||||
- MongoDB instance reachable from the host (local Docker or managed)
|
||||
- `trivy-db` binary on `PATH` for Trivy exports (and `oras` if publishing to OCI)
|
||||
- Plugin assemblies present in `StellaOps.Concelier.PluginBinaries/` (already included in the repo)
|
||||
- Optional: Docker/Podman runtime if you plan to run scanners locally
|
||||
|
||||
> **Tip** – air-gapped installs should preload `trivy-db` and `oras` binaries into the
|
||||
> runner image since Concelier never fetches them dynamically.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Configure Concelier
|
||||
|
||||
1. Copy the sample config to the expected location (CI/CD pipelines can stamp values
|
||||
into this file during deployment—see the “Deployment automation” note below):
|
||||
|
||||
```bash
|
||||
mkdir -p etc
|
||||
cp etc/concelier.yaml.sample etc/concelier.yaml
|
||||
```
|
||||
|
||||
2. Edit `etc/concelier.yaml` and update the MongoDB DSN (and optional database name).
|
||||
The default template configures plug-in discovery to look in `StellaOps.Concelier.PluginBinaries/`
|
||||
and disables remote telemetry exporters by default.
|
||||
|
||||
3. (Optional) Override settings via environment variables. All keys are prefixed with
|
||||
`CONCELIER_`. Example:
|
||||
|
||||
```bash
|
||||
export CONCELIER_STORAGE__DSN="mongodb://user:pass@mongo:27017/concelier"
|
||||
export CONCELIER_TELEMETRY__ENABLETRACING=false
|
||||
```
|
||||
|
||||
4. Start the web service from the repository root:
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Concelier.WebService
|
||||
```
|
||||
|
||||
On startup Concelier validates the options, boots MongoDB indexes, loads plug-ins,
|
||||
and exposes:
|
||||
|
||||
- `GET /health` – returns service status and telemetry settings
|
||||
- `GET /ready` – performs a MongoDB `ping`
|
||||
- `GET /jobs` + `POST /jobs/{kind}` – inspect and trigger connector/export jobs
|
||||
|
||||
> **Security note** – authentication now ships via StellaOps Authority. Keep
|
||||
# 10 · Concelier + CLI Quickstart
|
||||
|
||||
This guide walks through configuring the Concelier web service and the `stellaops-cli`
|
||||
tool so an operator can ingest advisories, merge them, and publish exports from a
|
||||
single workstation. It focuses on deployment-facing surfaces only (configuration,
|
||||
runtime wiring, CLI usage) and leaves connector/internal customization for later.
|
||||
|
||||
---
|
||||
|
||||
## 0 · Prerequisites
|
||||
|
||||
- .NET SDK **10.0.100-preview** (matches `global.json`)
|
||||
- MongoDB instance reachable from the host (local Docker or managed)
|
||||
- `trivy-db` binary on `PATH` for Trivy exports (and `oras` if publishing to OCI)
|
||||
- Plugin assemblies present in `StellaOps.Concelier.PluginBinaries/` (already included in the repo)
|
||||
- Optional: Docker/Podman runtime if you plan to run scanners locally
|
||||
|
||||
> **Tip** – air-gapped installs should preload `trivy-db` and `oras` binaries into the
|
||||
> runner image since Concelier never fetches them dynamically.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Configure Concelier
|
||||
|
||||
1. Copy the sample config to the expected location (CI/CD pipelines can stamp values
|
||||
into this file during deployment—see the “Deployment automation” note below):
|
||||
|
||||
```bash
|
||||
mkdir -p etc
|
||||
cp etc/concelier.yaml.sample etc/concelier.yaml
|
||||
```
|
||||
|
||||
2. Edit `etc/concelier.yaml` and update the MongoDB DSN (and optional database name).
|
||||
The default template configures plug-in discovery to look in `StellaOps.Concelier.PluginBinaries/`
|
||||
and disables remote telemetry exporters by default.
|
||||
|
||||
3. (Optional) Override settings via environment variables. All keys are prefixed with
|
||||
`CONCELIER_`. Example:
|
||||
|
||||
```bash
|
||||
export CONCELIER_STORAGE__DSN="mongodb://user:pass@mongo:27017/concelier"
|
||||
export CONCELIER_TELEMETRY__ENABLETRACING=false
|
||||
```
|
||||
|
||||
4. Start the web service from the repository root:
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Concelier.WebService
|
||||
```
|
||||
|
||||
On startup Concelier validates the options, boots MongoDB indexes, loads plug-ins,
|
||||
and exposes:
|
||||
|
||||
- `GET /health` – returns service status and telemetry settings
|
||||
- `GET /ready` – performs a MongoDB `ping`
|
||||
- `GET /jobs` + `POST /jobs/{kind}` – inspect and trigger connector/export jobs
|
||||
|
||||
> **Security note** – authentication now ships via StellaOps Authority. Keep
|
||||
> `authority.allowAnonymousFallback: true` only during the staged rollout and
|
||||
> disable it before **2025-12-31 UTC** so tokens become mandatory.
|
||||
|
||||
@@ -66,230 +66,244 @@ Rollout checkpoints for the two Authority toggles:
|
||||
| **Validation (staging)** | `true` | `true` | Verify token issuance, CLI scopes, and audit log noise without breaking cron jobs. | Watch `Concelier.Authorization.Audit` for `bypass=True` events and scope gaps; confirm CLI `auth status` succeeds. |
|
||||
| **Cutover rehearsal** | `true` | `false` | Exercise production-style enforcement before the deadline; ensure only approved maintenance ranges remain in `bypassNetworks`. | Expect some HTTP 401s; verify `web.jobs.triggered` metrics flatten for unauthenticated calls and audit logs highlight missing tokens. |
|
||||
| **Enforced (steady state)** | `true` | `false` | Production baseline after the 2025-12-31 UTC cutoff. | Alert on new `bypass=True` entries and on repeated 401 bursts; correlate with Authority availability dashboards. |
|
||||
|
||||
### Authority companion configuration (preview)
|
||||
|
||||
1. Copy the Authority sample configuration:
|
||||
|
||||
```bash
|
||||
cp etc/authority.yaml.sample etc/authority.yaml
|
||||
```
|
||||
|
||||
2. Update the issuer URL, token lifetimes, and plug-in descriptors to match your
|
||||
environment. Authority expects per-plugin manifests in `etc/authority.plugins/`;
|
||||
sample `standard.yaml` and `ldap.yaml` files are provided as starting points.
|
||||
For air-gapped installs keep the default plug-in binary directory
|
||||
(`../StellaOps.Authority.PluginBinaries`) so packaged plug-ins load without outbound access.
|
||||
|
||||
3. Environment variables prefixed with `STELLAOPS_AUTHORITY_` override individual
|
||||
fields. Example:
|
||||
|
||||
```bash
|
||||
export STELLAOPS_AUTHORITY__ISSUER="https://authority.stella-ops.local"
|
||||
export STELLAOPS_AUTHORITY__PLUGINDIRECTORIES__0="/srv/authority/plugins"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2 · Configure the CLI
|
||||
|
||||
The CLI reads configuration from JSON/YAML files *and* environment variables. The
|
||||
defaults live in `src/StellaOps.Cli/appsettings.json` and expect overrides at runtime.
|
||||
|
||||
| Setting | Environment variable | Default | Purpose |
|
||||
| ------- | -------------------- | ------- | ------- |
|
||||
| `BackendUrl` | `STELLAOPS_BACKEND_URL` | _empty_ | Base URL of the Concelier web service |
|
||||
| `ApiKey` | `API_KEY` | _empty_ | Reserved for legacy key auth; leave empty when using Authority |
|
||||
| `ScannerCacheDirectory` | `STELLAOPS_SCANNER_CACHE_DIRECTORY` | `scanners` | Local cache folder |
|
||||
| `ResultsDirectory` | `STELLAOPS_RESULTS_DIRECTORY` | `results` | Where scan outputs are written |
|
||||
| `Authority.Url` | `STELLAOPS_AUTHORITY_URL` | _empty_ | StellaOps Authority issuer/token endpoint |
|
||||
| `Authority.ClientId` | `STELLAOPS_AUTHORITY_CLIENT_ID` | _empty_ | Client identifier for the CLI |
|
||||
| `Authority.ClientSecret` | `STELLAOPS_AUTHORITY_CLIENT_SECRET` | _empty_ | Client secret (omit when using username/password grant) |
|
||||
| `Authority.Username` | `STELLAOPS_AUTHORITY_USERNAME` | _empty_ | Username for password grant flows |
|
||||
| `Authority.Password` | `STELLAOPS_AUTHORITY_PASSWORD` | _empty_ | Password for password grant flows |
|
||||
| `Authority.Scope` | `STELLAOPS_AUTHORITY_SCOPE` | `concelier.jobs.trigger` | OAuth scope requested for backend operations |
|
||||
| `Authority.TokenCacheDirectory` | `STELLAOPS_AUTHORITY_TOKEN_CACHE_DIR` | `~/.stellaops/tokens` | Directory that persists cached tokens |
|
||||
| `Authority.Resilience.EnableRetries` | `STELLAOPS_AUTHORITY_ENABLE_RETRIES` | `true` | Toggle Polly retry handler for Authority HTTP calls |
|
||||
| `Authority.Resilience.RetryDelays` | `STELLAOPS_AUTHORITY_RETRY_DELAYS` | `1s,2s,5s` | Comma- or space-separated backoff delays (hh:mm:ss) |
|
||||
| `Authority.Resilience.AllowOfflineCacheFallback` | `STELLAOPS_AUTHORITY_ALLOW_OFFLINE_CACHE_FALLBACK` | `true` | Allow CLI to reuse cached discovery/JWKS metadata when Authority is offline |
|
||||
| `Authority.Resilience.OfflineCacheTolerance` | `STELLAOPS_AUTHORITY_OFFLINE_CACHE_TOLERANCE` | `00:10:00` | Additional tolerance window applied to cached metadata |
|
||||
|
||||
Example bootstrap:
|
||||
|
||||
```bash
|
||||
export STELLAOPS_BACKEND_URL="http://localhost:5000"
|
||||
export STELLAOPS_RESULTS_DIRECTORY="$HOME/.stellaops/results"
|
||||
export STELLAOPS_AUTHORITY_URL="https://authority.local"
|
||||
export STELLAOPS_AUTHORITY_CLIENT_ID="concelier-cli"
|
||||
export STELLAOPS_AUTHORITY_CLIENT_SECRET="s3cr3t"
|
||||
dotnet run --project src/StellaOps.Cli -- db merge
|
||||
|
||||
# Acquire a bearer token and confirm cache state
|
||||
dotnet run --project src/StellaOps.Cli -- auth login
|
||||
dotnet run --project src/StellaOps.Cli -- auth status
|
||||
dotnet run --project src/StellaOps.Cli -- auth whoami
|
||||
```
|
||||
|
||||
Refer to `docs/dev/32_AUTH_CLIENT_GUIDE.md` for deeper guidance on tuning retry/offline settings and rollout checklists.
|
||||
|
||||
To persist configuration, you can create `stellaops-cli.yaml` next to the binary or
|
||||
rely on environment variables for ephemeral runners.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Operating Workflow
|
||||
|
||||
1. **Trigger connector fetch stages**
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage fetch
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage parse
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage map
|
||||
```
|
||||
|
||||
Use `--mode resume` when continuing from a previous window:
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source redhat --stage fetch --mode resume
|
||||
```
|
||||
|
||||
2. **Merge canonical advisories**
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- db merge
|
||||
```
|
||||
|
||||
3. **Produce exports**
|
||||
|
||||
```bash
|
||||
# JSON tree (vuln-list style)
|
||||
dotnet run --project src/StellaOps.Cli -- db export --format json
|
||||
|
||||
# Trivy DB (delta example)
|
||||
dotnet run --project src/StellaOps.Cli -- db export --format trivy-db --delta
|
||||
```
|
||||
|
||||
Concelier always produces a deterministic OCI layout. The first run after a clean
|
||||
bootstrap emits a **full** baseline; subsequent `--delta` runs reuse the previous
|
||||
baseline’s blobs when only JSON manifests change. If the exporter detects that a
|
||||
prior delta is still active (i.e., `LastDeltaDigest` is recorded) it automatically
|
||||
upgrades the next run to a full export and resets the baseline so operators never
|
||||
chain deltas indefinitely. The CLI exposes `--publish-full/--publish-delta` (for
|
||||
ORAS pushes) and `--include-full/--include-delta` (for offline bundles) should you
|
||||
need to override the defaults interactively.
|
||||
|
||||
**Smoke-check delta reuse:** after the first baseline completes, run the export a
|
||||
second time with `--delta` and verify that the new directory reports `mode=delta`
|
||||
while reusing the previous layer blob.
|
||||
|
||||
```bash
|
||||
export_root=${CONCELIER_EXPORT_ROOT:-exports/trivy}
|
||||
base=$(ls -1d "$export_root"/* | sort | tail -n2 | head -n1)
|
||||
delta=$(ls -1d "$export_root"/* | sort | tail -n1)
|
||||
|
||||
jq -r '.mode,.baseExportId' "$delta/metadata.json"
|
||||
|
||||
base_manifest=$(jq -r '.manifests[0].digest' "$base/index.json")
|
||||
delta_manifest=$(jq -r '.manifests[0].digest' "$delta/index.json")
|
||||
printf 'baseline manifest: %s\ndelta manifest: %s\n' "$base_manifest" "$delta_manifest"
|
||||
|
||||
layer_digest=$(jq -r '.layers[0].digest' "$base/blobs/sha256/${base_manifest#sha256:}")
|
||||
cmp "$base/blobs/sha256/${layer_digest#sha256:}" \
|
||||
"$delta/blobs/sha256/${layer_digest#sha256:}"
|
||||
```
|
||||
|
||||
`cmp` returning exit code `0` confirms the delta export reuses the baseline’s
|
||||
`db.tar.gz` layer instead of rebuilding it.
|
||||
|
||||
4. **Manage scanners (optional)**
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- scanner download --channel stable
|
||||
dotnet run --project src/StellaOps.Cli -- scan run --entry scanners/latest/Scanner.dll --target ./sboms
|
||||
dotnet run --project src/StellaOps.Cli -- scan upload --file results/scan-001.json
|
||||
```
|
||||
|
||||
Add `--verbose` to any command for structured console logs. All commands honour
|
||||
`Ctrl+C` cancellation and exit with non-zero status codes when the backend returns
|
||||
a problem document.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Verification Checklist
|
||||
|
||||
- Concelier `/health` returns `"status":"healthy"` and Storage bootstrap is marked
|
||||
complete after startup.
|
||||
- CLI commands return HTTP 202 with a `Location` header (job tracking URL) when
|
||||
triggering Concelier jobs.
|
||||
- Export artefacts are materialised under the configured output directories and
|
||||
their manifests record digests.
|
||||
- MongoDB contains the expected `document`, `dto`, `advisory`, and `export_state`
|
||||
collections after a run.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Deployment Automation
|
||||
|
||||
- Treat `etc/concelier.yaml.sample` as the canonical template. CI/CD should copy it to
|
||||
the deployment artifact and replace placeholders (DSN, telemetry endpoints, cron
|
||||
overrides) with environment-specific secrets.
|
||||
- Keep secret material (Mongo credentials, OTLP tokens) outside of the repository;
|
||||
inject them via secret stores or pipeline variables at stamp time.
|
||||
- When building container images, include `trivy-db` (and `oras` if used) so air-gapped
|
||||
clusters do not need outbound downloads at runtime.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Next Steps
|
||||
|
||||
- Enable authority-backed authentication in non-production first. Set
|
||||
`authority.enabled: true` while keeping `authority.allowAnonymousFallback: true`
|
||||
to observe logs, then flip it to `false` before 2025-12-31 UTC to enforce tokens.
|
||||
- Automate the workflow above via CI/CD (compose stack or Kubernetes CronJobs).
|
||||
- Pair with the Concelier connector teams when enabling additional sources so their
|
||||
module-specific requirements are pulled in safely.
|
||||
|
||||
---
|
||||
|
||||
|
||||
### Authority companion configuration (preview)
|
||||
|
||||
1. Copy the Authority sample configuration:
|
||||
|
||||
```bash
|
||||
cp etc/authority.yaml.sample etc/authority.yaml
|
||||
```
|
||||
|
||||
2. Update the issuer URL, token lifetimes, and plug-in descriptors to match your
|
||||
environment. Authority expects per-plugin manifests in `etc/authority.plugins/`;
|
||||
sample `standard.yaml` and `ldap.yaml` files are provided as starting points.
|
||||
For air-gapped installs keep the default plug-in binary directory
|
||||
(`../StellaOps.Authority.PluginBinaries`) so packaged plug-ins load without outbound access.
|
||||
|
||||
3. Environment variables prefixed with `STELLAOPS_AUTHORITY_` override individual
|
||||
fields. Example:
|
||||
|
||||
```bash
|
||||
export STELLAOPS_AUTHORITY__ISSUER="https://authority.stella-ops.local"
|
||||
export STELLAOPS_AUTHORITY__PLUGINDIRECTORIES__0="/srv/authority/plugins"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2 · Configure the CLI
|
||||
|
||||
The CLI reads configuration from JSON/YAML files *and* environment variables. The
|
||||
defaults live in `src/StellaOps.Cli/appsettings.json` and expect overrides at runtime.
|
||||
|
||||
| Setting | Environment variable | Default | Purpose |
|
||||
| ------- | -------------------- | ------- | ------- |
|
||||
| `BackendUrl` | `STELLAOPS_BACKEND_URL` | _empty_ | Base URL of the Concelier web service |
|
||||
| `ApiKey` | `API_KEY` | _empty_ | Reserved for legacy key auth; leave empty when using Authority |
|
||||
| `ScannerCacheDirectory` | `STELLAOPS_SCANNER_CACHE_DIRECTORY` | `scanners` | Local cache folder |
|
||||
| `ResultsDirectory` | `STELLAOPS_RESULTS_DIRECTORY` | `results` | Where scan outputs are written |
|
||||
| `Authority.Url` | `STELLAOPS_AUTHORITY_URL` | _empty_ | StellaOps Authority issuer/token endpoint |
|
||||
| `Authority.ClientId` | `STELLAOPS_AUTHORITY_CLIENT_ID` | _empty_ | Client identifier for the CLI |
|
||||
| `Authority.ClientSecret` | `STELLAOPS_AUTHORITY_CLIENT_SECRET` | _empty_ | Client secret (omit when using username/password grant) |
|
||||
| `Authority.Username` | `STELLAOPS_AUTHORITY_USERNAME` | _empty_ | Username for password grant flows |
|
||||
| `Authority.Password` | `STELLAOPS_AUTHORITY_PASSWORD` | _empty_ | Password for password grant flows |
|
||||
| `Authority.Scope` | `STELLAOPS_AUTHORITY_SCOPE` | `concelier.jobs.trigger advisory:ingest` | Space-separated OAuth scopes requested for backend operations |
|
||||
| `Authority.TokenCacheDirectory` | `STELLAOPS_AUTHORITY_TOKEN_CACHE_DIR` | `~/.stellaops/tokens` | Directory that persists cached tokens |
|
||||
| `Authority.Resilience.EnableRetries` | `STELLAOPS_AUTHORITY_ENABLE_RETRIES` | `true` | Toggle Polly retry handler for Authority HTTP calls |
|
||||
| `Authority.Resilience.RetryDelays` | `STELLAOPS_AUTHORITY_RETRY_DELAYS` | `1s,2s,5s` | Comma- or space-separated backoff delays (hh:mm:ss) |
|
||||
| `Authority.Resilience.AllowOfflineCacheFallback` | `STELLAOPS_AUTHORITY_ALLOW_OFFLINE_CACHE_FALLBACK` | `true` | Allow CLI to reuse cached discovery/JWKS metadata when Authority is offline |
|
||||
| `Authority.Resilience.OfflineCacheTolerance` | `STELLAOPS_AUTHORITY_OFFLINE_CACHE_TOLERANCE` | `00:10:00` | Additional tolerance window applied to cached metadata |
|
||||
|
||||
Example bootstrap:
|
||||
|
||||
```bash
|
||||
export STELLAOPS_BACKEND_URL="http://localhost:5000"
|
||||
export STELLAOPS_RESULTS_DIRECTORY="$HOME/.stellaops/results"
|
||||
export STELLAOPS_AUTHORITY_URL="https://authority.local"
|
||||
export STELLAOPS_AUTHORITY_CLIENT_ID="concelier-cli"
|
||||
export STELLAOPS_AUTHORITY_CLIENT_SECRET="s3cr3t"
|
||||
export STELLAOPS_AUTHORITY_SCOPE="concelier.jobs.trigger advisory:ingest advisory:read"
|
||||
dotnet run --project src/StellaOps.Cli -- db merge
|
||||
|
||||
# Acquire a bearer token and confirm cache state
|
||||
dotnet run --project src/StellaOps.Cli -- auth login
|
||||
dotnet run --project src/StellaOps.Cli -- auth status
|
||||
dotnet run --project src/StellaOps.Cli -- auth whoami
|
||||
```
|
||||
|
||||
Refer to `docs/dev/32_AUTH_CLIENT_GUIDE.md` for deeper guidance on tuning retry/offline settings and rollout checklists.
|
||||
|
||||
To persist configuration, you can create `stellaops-cli.yaml` next to the binary or
|
||||
rely on environment variables for ephemeral runners.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Operating Workflow
|
||||
|
||||
1. **Trigger connector fetch stages**
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage fetch
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage parse
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source osv --stage map
|
||||
```
|
||||
|
||||
Use `--mode resume` when continuing from a previous window:
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- db fetch --source redhat --stage fetch --mode resume
|
||||
```
|
||||
|
||||
2. **Merge canonical advisories**
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- db merge
|
||||
```
|
||||
|
||||
3. **Produce exports**
|
||||
|
||||
```bash
|
||||
# JSON tree (vuln-list style)
|
||||
dotnet run --project src/StellaOps.Cli -- db export --format json
|
||||
|
||||
# Trivy DB (delta example)
|
||||
dotnet run --project src/StellaOps.Cli -- db export --format trivy-db --delta
|
||||
```
|
||||
|
||||
Concelier always produces a deterministic OCI layout. The first run after a clean
|
||||
bootstrap emits a **full** baseline; subsequent `--delta` runs reuse the previous
|
||||
baseline’s blobs when only JSON manifests change. If the exporter detects that a
|
||||
prior delta is still active (i.e., `LastDeltaDigest` is recorded) it automatically
|
||||
upgrades the next run to a full export and resets the baseline so operators never
|
||||
chain deltas indefinitely. The CLI exposes `--publish-full/--publish-delta` (for
|
||||
ORAS pushes) and `--include-full/--include-delta` (for offline bundles) should you
|
||||
need to override the defaults interactively.
|
||||
|
||||
**Smoke-check delta reuse:** after the first baseline completes, run the export a
|
||||
second time with `--delta` and verify that the new directory reports `mode=delta`
|
||||
while reusing the previous layer blob.
|
||||
|
||||
```bash
|
||||
export_root=${CONCELIER_EXPORT_ROOT:-exports/trivy}
|
||||
base=$(ls -1d "$export_root"/* | sort | tail -n2 | head -n1)
|
||||
delta=$(ls -1d "$export_root"/* | sort | tail -n1)
|
||||
|
||||
jq -r '.mode,.baseExportId' "$delta/metadata.json"
|
||||
|
||||
base_manifest=$(jq -r '.manifests[0].digest' "$base/index.json")
|
||||
delta_manifest=$(jq -r '.manifests[0].digest' "$delta/index.json")
|
||||
printf 'baseline manifest: %s\ndelta manifest: %s\n' "$base_manifest" "$delta_manifest"
|
||||
|
||||
layer_digest=$(jq -r '.layers[0].digest' "$base/blobs/sha256/${base_manifest#sha256:}")
|
||||
cmp "$base/blobs/sha256/${layer_digest#sha256:}" \
|
||||
"$delta/blobs/sha256/${layer_digest#sha256:}"
|
||||
```
|
||||
|
||||
`cmp` returning exit code `0` confirms the delta export reuses the baseline’s
|
||||
`db.tar.gz` layer instead of rebuilding it.
|
||||
|
||||
4. **Manage scanners (optional)**
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Cli -- scanner download --channel stable
|
||||
dotnet run --project src/StellaOps.Cli -- scan run --entry scanners/latest/Scanner.dll --target ./sboms
|
||||
dotnet run --project src/StellaOps.Cli -- scan upload --file results/scan-001.json
|
||||
```
|
||||
|
||||
Add `--verbose` to any command for structured console logs. All commands honour
|
||||
`Ctrl+C` cancellation and exit with non-zero status codes when the backend returns
|
||||
a problem document.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Verification Checklist
|
||||
|
||||
- Concelier `/health` returns `"status":"healthy"` and Storage bootstrap is marked
|
||||
complete after startup.
|
||||
- CLI commands return HTTP 202 with a `Location` header (job tracking URL) when
|
||||
triggering Concelier jobs.
|
||||
- Export artefacts are materialised under the configured output directories and
|
||||
their manifests record digests.
|
||||
- MongoDB contains the expected `document`, `dto`, `advisory`, and `export_state`
|
||||
collections after a run.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Deployment Automation
|
||||
|
||||
- Treat `etc/concelier.yaml.sample` as the canonical template. CI/CD should copy it to
|
||||
the deployment artifact and replace placeholders (DSN, telemetry endpoints, cron
|
||||
overrides) with environment-specific secrets.
|
||||
- Keep secret material (Mongo credentials, OTLP tokens) outside of the repository;
|
||||
inject them via secret stores or pipeline variables at stamp time.
|
||||
- When building container images, include `trivy-db` (and `oras` if used) so air-gapped
|
||||
clusters do not need outbound downloads at runtime.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Next Steps
|
||||
|
||||
- Enable authority-backed authentication in non-production first. Set
|
||||
`authority.enabled: true` while keeping `authority.allowAnonymousFallback: true`
|
||||
to observe logs, then flip it to `false` before 2025-12-31 UTC to enforce tokens.
|
||||
- Automate the workflow above via CI/CD (compose stack or Kubernetes CronJobs).
|
||||
- Pair with the Concelier connector teams when enabling additional sources so their
|
||||
module-specific requirements are pulled in safely.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Authority Integration
|
||||
|
||||
- Concelier now authenticates callers through StellaOps Authority using OAuth 2.0
|
||||
resource server flows. Populate the `authority` block in `concelier.yaml`:
|
||||
|
||||
```yaml
|
||||
authority:
|
||||
enabled: true
|
||||
allowAnonymousFallback: false # keep true only during the staged rollout window
|
||||
issuer: "https://authority.example.org"
|
||||
audiences:
|
||||
- "api://concelier"
|
||||
requiredScopes:
|
||||
- "concelier.jobs.trigger"
|
||||
clientId: "concelier-jobs"
|
||||
clientSecretFile: "../secrets/concelier-jobs.secret"
|
||||
clientScopes:
|
||||
- "concelier.jobs.trigger"
|
||||
bypassNetworks:
|
||||
- "127.0.0.1/32"
|
||||
- "::1/128"
|
||||
```
|
||||
|
||||
- Store the client secret outside of source control. Either provide it via
|
||||
`authority.clientSecret` (environment variable `CONCELIER_AUTHORITY__CLIENTSECRET`)
|
||||
or point `authority.clientSecretFile` to a file mounted at runtime.
|
||||
- Cron jobs running on the same host can keep using the API thanks to the loopback
|
||||
bypass mask. Add additional CIDR ranges as needed; every bypass is logged.
|
||||
- Export the same configuration to Kubernetes or systemd by setting environment
|
||||
variables such as:
|
||||
|
||||
```bash
|
||||
export CONCELIER_AUTHORITY__ENABLED=true
|
||||
export CONCELIER_AUTHORITY__ALLOWANONYMOUSFALLBACK=false
|
||||
export CONCELIER_AUTHORITY__ISSUER="https://authority.example.org"
|
||||
export CONCELIER_AUTHORITY__CLIENTID="concelier-jobs"
|
||||
export CONCELIER_AUTHORITY__CLIENTSECRETFILE="/var/run/secrets/concelier/authority-client"
|
||||
```
|
||||
|
||||
|
||||
```yaml
|
||||
authority:
|
||||
enabled: true
|
||||
allowAnonymousFallback: false # keep true only during the staged rollout window
|
||||
issuer: "https://authority.example.org"
|
||||
audiences:
|
||||
- "api://concelier"
|
||||
requiredScopes:
|
||||
- "concelier.jobs.trigger"
|
||||
- "advisory:read"
|
||||
- "advisory:ingest"
|
||||
requiredTenants:
|
||||
- "tenant-default"
|
||||
clientId: "concelier-jobs"
|
||||
clientSecretFile: "../secrets/concelier-jobs.secret"
|
||||
clientScopes:
|
||||
- "concelier.jobs.trigger"
|
||||
- "advisory:read"
|
||||
- "advisory:ingest"
|
||||
bypassNetworks:
|
||||
- "127.0.0.1/32"
|
||||
- "::1/128"
|
||||
```
|
||||
|
||||
- Store the client secret outside of source control. Either provide it via
|
||||
`authority.clientSecret` (environment variable `CONCELIER_AUTHORITY__CLIENTSECRET`)
|
||||
or point `authority.clientSecretFile` to a file mounted at runtime.
|
||||
- Cron jobs running on the same host can keep using the API thanks to the loopback
|
||||
bypass mask. Add additional CIDR ranges as needed; every bypass is logged.
|
||||
- Export the same configuration to Kubernetes or systemd by setting environment
|
||||
variables such as:
|
||||
|
||||
```bash
|
||||
export CONCELIER_AUTHORITY__ENABLED=true
|
||||
export CONCELIER_AUTHORITY__ALLOWANONYMOUSFALLBACK=false
|
||||
export CONCELIER_AUTHORITY__ISSUER="https://authority.example.org"
|
||||
export CONCELIER_AUTHORITY__CLIENTID="concelier-jobs"
|
||||
export CONCELIER_AUTHORITY__CLIENTSECRETFILE="/var/run/secrets/concelier/authority-client"
|
||||
export CONCELIER_AUTHORITY__REQUIREDSCOPES__0="concelier.jobs.trigger"
|
||||
export CONCELIER_AUTHORITY__REQUIREDSCOPES__1="advisory:read"
|
||||
export CONCELIER_AUTHORITY__REQUIREDSCOPES__2="advisory:ingest"
|
||||
export CONCELIER_AUTHORITY__CLIENTSCOPES__0="concelier.jobs.trigger"
|
||||
export CONCELIER_AUTHORITY__CLIENTSCOPES__1="advisory:read"
|
||||
export CONCELIER_AUTHORITY__CLIENTSCOPES__2="advisory:ingest"
|
||||
export CONCELIER_AUTHORITY__REQUIREDTENANTS__0="tenant-default"
|
||||
```
|
||||
|
||||
- CLI commands already pass `Authorization` headers when credentials are supplied.
|
||||
Configure the CLI with matching Authority settings (`docs/09_API_CLI_REFERENCE.md`)
|
||||
so that automation can obtain tokens with the same client credentials. Concelier
|
||||
|
||||
@@ -36,21 +36,59 @@ Authority persists every issued token in MongoDB so operators can audit or revok
|
||||
|
||||
- **Collection:** `authority_tokens`
|
||||
- **Key fields:**
|
||||
- `tokenId`, `type` (`access_token`, `refresh_token`, `device_code`, `authorization_code`)
|
||||
- `subjectId`, `clientId`, ordered `scope` array
|
||||
- `status` (`valid`, `revoked`, `expired`), `createdAt`, optional `expiresAt`
|
||||
- `revokedAt`, machine-readable `revokedReason`, optional `revokedReasonDescription`
|
||||
- `revokedMetadata` (string dictionary for plugin-specific context)
|
||||
- **Persistence flow:** `PersistTokensHandler` stamps missing JWT IDs, normalises scopes, and stores every principal emitted by OpenIddict.
|
||||
- **Revocation flow:** `AuthorityTokenStore.UpdateStatusAsync` flips status, records the reason metadata, and is invoked by token revocation handlers and plugin provisioning events (e.g., disabling a user).
|
||||
- `tokenId`, `type` (`access_token`, `refresh_token`, `device_code`, `authorization_code`)
|
||||
- `subjectId`, `clientId`, ordered `scope` array
|
||||
- `tenant` (lower-cased tenant hint from the issuing client, omitted for global clients)
|
||||
- `status` (`valid`, `revoked`, `expired`), `createdAt`, optional `expiresAt`
|
||||
- `revokedAt`, machine-readable `revokedReason`, optional `revokedReasonDescription`
|
||||
- `revokedMetadata` (string dictionary for plugin-specific context)
|
||||
- **Persistence flow:** `PersistTokensHandler` stamps missing JWT IDs, normalises scopes, and stores every principal emitted by OpenIddict.
|
||||
- **Revocation flow:** `AuthorityTokenStore.UpdateStatusAsync` flips status, records the reason metadata, and is invoked by token revocation handlers and plugin provisioning events (e.g., disabling a user).
|
||||
- **Expiry maintenance:** `AuthorityTokenStore.DeleteExpiredAsync` prunes non-revoked tokens past their `expiresAt` timestamp. Operators should schedule this in maintenance windows if large volumes of tokens are issued.
|
||||
|
||||
### Expectations for resource servers
|
||||
Resource servers (Concelier WebService, Backend, Agent) **must not** assume in-memory caches are authoritative. They should:
|
||||
|
||||
- cache `/jwks` and `/revocations/export` responses within configured lifetimes;
|
||||
- honour `revokedReason` metadata when shaping audit trails;
|
||||
- treat `status != "valid"` or missing tokens as immediate denial conditions.
|
||||
- honour `revokedReason` metadata when shaping audit trails;
|
||||
- treat `status != "valid"` or missing tokens as immediate denial conditions.
|
||||
|
||||
### Tenant propagation
|
||||
|
||||
- Client provisioning (bootstrap or plug-in) accepts a `tenant` hint. Authority normalises the value (`trim().ToLowerInvariant()`) and persists it alongside the registration. Clients without an explicit tenant remain global.
|
||||
- Issued principals include the `stellaops:tenant` claim. `PersistTokensHandler` mirrors this claim into `authority_tokens.tenant`, enabling per-tenant revocation and reporting.
|
||||
- Rate limiter metadata now tags requests with `authority.tenant`, unlocking per-tenant throughput metrics and diagnostic filters. Audit events (`authority.client_credentials.grant`, `authority.password.grant`, bootstrap flows) surface the tenant and login attempt documents index on `{tenant, occurredAt}` for quick queries.
|
||||
- Password grant flows reuse the client registration's tenant and enforce the configured scope allow-list. Requested scopes outside that list (or mismatched tenants) trigger `invalid_scope`/`invalid_client` failures, ensuring cross-tenant access is denied before token issuance.
|
||||
|
||||
### Default service scopes
|
||||
|
||||
| Client ID | Purpose | Scopes granted | Sender constraint | Tenant |
|
||||
|----------------------|---------------------------------------|--------------------------------------|-------------------|-----------------|
|
||||
| `concelier-ingest` | Concelier raw advisory ingestion | `advisory:ingest`, `advisory:read` | `dpop` | `tenant-default` |
|
||||
| `excitor-ingest` | Excititor raw VEX ingestion | `vex:ingest`, `vex:read` | `dpop` | `tenant-default` |
|
||||
| `aoc-verifier` | Aggregation-only contract verification | `aoc:verify` | `dpop` | `tenant-default` |
|
||||
| `cartographer-service` | Graph snapshot construction | `graph:write`, `graph:read` | `dpop` | `tenant-default` |
|
||||
| `graph-api` | Graph Explorer gateway/API | `graph:read`, `graph:export`, `graph:simulate` | `dpop` | `tenant-default` |
|
||||
| `vuln-explorer-ui` | Vuln Explorer UI/API | `vuln:read` | `dpop` | `tenant-default` |
|
||||
|
||||
> **Secret hygiene (2025‑10‑27):** The repository includes a convenience `etc/authority.yaml` for compose/helm smoke tests. Every entry’s `secretFile` points to `etc/secrets/*.secret`, which ship with `*-change-me` placeholders—replace them with strong values (and wire them through your vault/secret manager) before issuing tokens in CI, staging, or production.
|
||||
|
||||
These registrations are provided as examples in `etc/authority.yaml.sample`. Clone them per tenant (for example `concelier-tenant-a`, `concelier-tenant-b`) so tokens remain tenant-scoped by construction.
|
||||
|
||||
Graph Explorer introduces dedicated scopes: `graph:write` for Cartographer build jobs, `graph:read` for query/read operations, `graph:export` for long-running export downloads, and `graph:simulate` for what-if overlays. Assign only the scopes a client actually needs to preserve least privilege—UI-facing clients should typically request read/export access, while background services (Cartographer, Scheduler) require write privileges.
|
||||
|
||||
#### Least-privilege guidance for graph clients
|
||||
|
||||
- **Service identities** – The Cartographer worker should request `graph:write` and `graph:read` only; grant `graph:simulate` exclusively to pipeline automation that invokes Policy Engine overlays on demand. Keep `graph:export` scoped to API gateway components responsible for streaming GraphML/JSONL artifacts. Authority enforces this by rejecting `graph:write` tokens that lack `properties.serviceIdentity: cartographer`.
|
||||
- **Tenant propagation** – Every client registration must pin a `tenant` hint. Authority normalises the value and stamps it into issued tokens (`stellaops:tenant`) so downstream services (Scheduler, Graph API, Console) can enforce tenant isolation without custom headers. Graph scopes (`graph:read`, `graph:write`, `graph:export`, `graph:simulate`) are denied if the tenant hint is missing.
|
||||
- **SDK alignment** – Use the generated `StellaOpsScopes` constants in service code to request graph scopes. Hard-coded strings risk falling out of sync as additional graph capabilities are added.
|
||||
- **DPOP for automation** – Maintain sender-constrained (`dpop`) flows for Cartographer and Scheduler to limit reuse of access tokens if a build host is compromised. For UI-facing tokens, pair `graph:read`/`graph:export` with short lifetimes and enforce refresh-token rotation at the gateway.
|
||||
|
||||
#### Vuln Explorer permalinks
|
||||
|
||||
- **Scope** – `vuln:read` authorises Vuln Explorer to fetch advisory/linkset evidence and issue shareable links. Assign it only to front-end/API clients that must render vulnerability details.
|
||||
- **Signed links** – `POST /permalinks/vuln` (requires `vuln:read`) accepts `{ "tenant": "tenant-a", "resourceKind": "vulnerability", "state": { ... }, "expiresInSeconds": 86400 }` and returns a JWT (`token`) plus `issuedAt`/`expiresAt`. The token embeds the tenant, requested state, and `vuln:read` scope and is signed with the same Authority signing keys published via `/jwks`.
|
||||
- **Validation** – Resource servers verify the permalink using cached JWKS: check signature, ensure the tenant matches the current request context, honour the expiry, and enforce the contained `vuln:read` scope. The payload’s `resource.state` block is opaque JSON so UIs can round-trip filters/search terms without new schema changes.
|
||||
|
||||
## 4. Revocation Pipeline
|
||||
Authority centralises revocation in `authority_revocations` with deterministic categories:
|
||||
@@ -119,18 +157,38 @@ Authority signs revocation bundles and publishes JWKS entries via the new signin
|
||||
The rotation API leverages the same cryptography abstractions as revocation signing; no restart is required and the previous key is marked `retired` but kept available for verification.
|
||||
|
||||
## 6. Bootstrap & Administrative Endpoints
|
||||
Administrative APIs live under `/internal/*` and require the bootstrap API key plus rate-limiter compliance.
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
| --- | --- | --- |
|
||||
| `/internal/users` | `POST` | Provision initial administrative accounts through the registered password-capable plug-in. Emits structured audit events. |
|
||||
| `/internal/clients` | `POST` | Provision OAuth clients (client credentials / device code). |
|
||||
| `/internal/revocations/export` | `GET` | Export revocation bundle + detached JWS + digest. |
|
||||
| `/internal/signing/rotate` | `POST` | Promote a new signing key (see SOP above). Request body accepts `keyId`, `location`, optional `source`, `algorithm`, `provider`, and metadata. |
|
||||
|
||||
All administrative calls emit `AuthEventRecord` entries enriched with correlation IDs, PII tags, and network metadata for offline SOC ingestion.
|
||||
|
||||
## 7. Configuration Reference
|
||||
Administrative APIs live under `/internal/*` and require the bootstrap API key plus rate-limiter compliance.
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
| --- | --- | --- |
|
||||
| `/internal/users` | `POST` | Provision initial administrative accounts through the registered password-capable plug-in. Emits structured audit events. |
|
||||
| `/internal/clients` | `POST` | Provision OAuth clients (client credentials / device code). |
|
||||
| `/internal/revocations/export` | `GET` | Export revocation bundle + detached JWS + digest. |
|
||||
| `/internal/signing/rotate` | `POST` | Promote a new signing key (see SOP above). Request body accepts `keyId`, `location`, optional `source`, `algorithm`, `provider`, and metadata. |
|
||||
|
||||
All administrative calls emit `AuthEventRecord` entries enriched with correlation IDs, PII tags, and network metadata for offline SOC ingestion.
|
||||
|
||||
> **Tenant hint:** include a `tenant` entry inside `properties` when bootstrapping clients. Authority normalises the value, stores it on the registration, and stamps future tokens/audit events with the tenant.
|
||||
|
||||
### Bootstrap client example
|
||||
|
||||
```jsonc
|
||||
POST /internal/clients
|
||||
{
|
||||
"clientId": "concelier",
|
||||
"confidential": true,
|
||||
"displayName": "Concelier Backend",
|
||||
"allowedGrantTypes": ["client_credentials"],
|
||||
"allowedScopes": ["concelier.jobs.trigger", "advisory:ingest", "advisory:read"],
|
||||
"properties": {
|
||||
"tenant": "tenant-default"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For environments with multiple tenants, repeat the call per tenant-specific client (e.g. `concelier-tenant-a`, `concelier-tenant-b`) or append suffixes to the client identifier.
|
||||
|
||||
## 7. Configuration Reference
|
||||
|
||||
| Section | Key | Description | Notes |
|
||||
| --- | --- | --- | --- |
|
||||
@@ -181,11 +239,45 @@ Authority now understands two flavours of sender-constrained OAuth clients:
|
||||
- Certificate bindings now act as an allow-list: Authority verifies thumbprint, subject, issuer, serial number, and any declared SAN values against the presented certificate, with rotation grace windows applied to `notBefore/notAfter`. Operators can enforce subject regexes, SAN type allow-lists (`dns`, `uri`, `ip`), trusted certificate authorities, and rotation grace via `security.senderConstraints.mtls.*`.
|
||||
|
||||
Both modes persist additional metadata in `authority_tokens`: `senderConstraint` records the enforced policy, while `senderKeyThumbprint` stores the DPoP JWK thumbprint or mTLS certificate hash captured at issuance. Downstream services can rely on these fields (and the corresponding `cnf` claim) when auditing offline copies of the token store.
|
||||
|
||||
## 8. Offline & Sovereign Operation
|
||||
- **No outbound dependencies:** Authority only contacts MongoDB and local plugins. Discovery and JWKS are cached by clients with offline tolerances (`AllowOfflineCacheFallback`, `OfflineCacheTolerance`). Operators should mirror these responses for air-gapped use.
|
||||
- **Structured logging:** Every revocation export, signing rotation, bootstrap action, and token issuance emits structured logs with `traceId`, `client_id`, `subjectId`, and `network.remoteIp` where applicable. Mirror logs to your SIEM to retain audit trails without central connectivity.
|
||||
- **Determinism:** Sorting rules in token and revocation exports guarantee byte-for-byte identical artefacts given the same datastore state. Hashes and signatures remain stable across machines.
|
||||
|
||||
### 7.2 Policy Engine clients & scopes
|
||||
|
||||
Policy Engine v2 introduces dedicated scopes and a service identity that materialises effective findings. Configure Authority as follows when provisioning policy clients:
|
||||
|
||||
| Client | Scopes | Notes |
|
||||
| --- | --- | --- |
|
||||
| `policy-engine` (service) | `policy:run`, `findings:read`, `effective:write` | Must include `properties.serviceIdentity: policy-engine` and a tenant. Authority rejects `effective:write` tokens without the marker or tenant. |
|
||||
| `policy-cli` / automation | `policy:write`, `policy:submit`, `policy:run`, `findings:read` | Keep scopes minimal; only trusted automation should add `policy:approve`/`policy:activate`. |
|
||||
| UI/editor sessions | `policy:read`, `policy:write`, `policy:simulate` (+ reviewer/approver scopes as appropriate) | Issue tenant-specific clients so audit and rate limits remain scoped. |
|
||||
|
||||
Sample YAML entry:
|
||||
|
||||
```yaml
|
||||
- clientId: "policy-engine"
|
||||
displayName: "Policy Engine Service"
|
||||
grantTypes: [ "client_credentials" ]
|
||||
audiences: [ "api://policy-engine" ]
|
||||
scopes: [ "policy:run", "findings:read", "effective:write" ]
|
||||
tenant: "tenant-default"
|
||||
properties:
|
||||
serviceIdentity: "policy-engine"
|
||||
senderConstraint: "dpop"
|
||||
auth:
|
||||
type: "client_secret"
|
||||
secretFile: "../secrets/policy-engine.secret"
|
||||
```
|
||||
|
||||
Compliance checklist:
|
||||
|
||||
- [ ] `policy-engine` client includes `properties.serviceIdentity: policy-engine` and a tenant hint; logins missing either are rejected.
|
||||
- [ ] Non-service clients omit `effective:write` and receive only the scopes required for their role (`policy:write`, `policy:submit`, `policy:approve`, `policy:activate`, etc.).
|
||||
- [ ] Approval/activation workflows use identities distinct from authoring identities; tenants are provisioned per client to keep telemetry segregated.
|
||||
- [ ] Operators document reviewer assignments and incident procedures alongside `/docs/security/policy-governance.md` and archive policy evidence bundles (`stella policy bundle export`) with each release.
|
||||
|
||||
## 8. Offline & Sovereign Operation
|
||||
- **No outbound dependencies:** Authority only contacts MongoDB and local plugins. Discovery and JWKS are cached by clients with offline tolerances (`AllowOfflineCacheFallback`, `OfflineCacheTolerance`). Operators should mirror these responses for air-gapped use.
|
||||
- **Structured logging:** Every revocation export, signing rotation, bootstrap action, and token issuance emits structured logs with `traceId`, `client_id`, `subjectId`, and `network.remoteIp` where applicable. Mirror logs to your SIEM to retain audit trails without central connectivity.
|
||||
- **Determinism:** Sorting rules in token and revocation exports guarantee byte-for-byte identical artefacts given the same datastore state. Hashes and signatures remain stable across machines.
|
||||
|
||||
## 9. Operational Checklist
|
||||
- [ ] Protect the bootstrap API key and disable bootstrap endpoints (`bootstrap.enabled: false`) once initial setup is complete.
|
||||
|
||||
@@ -232,6 +232,101 @@ Images are deduplicated and sorted by digest. Label keys are normalised to lower
|
||||
```
|
||||
|
||||
Metadata keys are lowercased, first‑writer wins (duplicates with different casing are ignored), and optional IDs (`scheduleId`, `runId`) are trimmed when empty. Use the canonical serializer when emitting events so audit digests remain reproducible.
|
||||
|
||||
#### 3.1.5 Run Summary (`run_summaries`)
|
||||
|
||||
Materialized view powering the Scheduler UI dashboards. Stores the latest roll-up per schedule/tenant, enabling quick “last run” banners and sparkline counters without scanning the full `runs` collection.
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"tenantId": "tenant-alpha",
|
||||
"scheduleId": "sch_20251018a",
|
||||
"updatedAt": "2025-10-18T22:10:10Z",
|
||||
"lastRun": {
|
||||
"runId": "run_20251018_0001",
|
||||
"trigger": "feedser",
|
||||
"state": "completed",
|
||||
"createdAt": "2025-10-18T22:03:14Z",
|
||||
"startedAt": "2025-10-18T22:03:20Z",
|
||||
"finishedAt": "2025-10-18T22:08:45Z",
|
||||
"stats": {
|
||||
"candidates": 1280,
|
||||
"deduped": 910,
|
||||
"queued": 0,
|
||||
"completed": 910,
|
||||
"deltas": 42,
|
||||
"newCriticals": 7,
|
||||
"newHigh": 11,
|
||||
"newMedium": 18,
|
||||
"newLow": 6
|
||||
},
|
||||
"error": null
|
||||
},
|
||||
"recent": [
|
||||
{
|
||||
"runId": "run_20251018_0001",
|
||||
"trigger": "feedser",
|
||||
"state": "completed",
|
||||
"createdAt": "2025-10-18T22:03:14Z",
|
||||
"startedAt": "2025-10-18T22:03:20Z",
|
||||
"finishedAt": "2025-10-18T22:08:45Z",
|
||||
"stats": {
|
||||
"candidates": 1280,
|
||||
"deduped": 910,
|
||||
"queued": 0,
|
||||
"completed": 910,
|
||||
"deltas": 42,
|
||||
"newCriticals": 7,
|
||||
"newHigh": 11,
|
||||
"newMedium": 18,
|
||||
"newLow": 6
|
||||
},
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"runId": "run_20251017_0003",
|
||||
"trigger": "cron",
|
||||
"state": "error",
|
||||
"createdAt": "2025-10-17T22:01:02Z",
|
||||
"startedAt": "2025-10-17T22:01:08Z",
|
||||
"finishedAt": "2025-10-17T22:04:11Z",
|
||||
"stats": {
|
||||
"candidates": 1040,
|
||||
"deduped": 812,
|
||||
"queued": 0,
|
||||
"completed": 640,
|
||||
"deltas": 18,
|
||||
"newCriticals": 2,
|
||||
"newHigh": 4,
|
||||
"newMedium": 7,
|
||||
"newLow": 3
|
||||
},
|
||||
"error": "scanner timeout"
|
||||
}
|
||||
],
|
||||
"counters": {
|
||||
"total": 3,
|
||||
"planning": 0,
|
||||
"queued": 0,
|
||||
"running": 0,
|
||||
"completed": 1,
|
||||
"error": 1,
|
||||
"cancelled": 1,
|
||||
"totalDeltas": 60,
|
||||
"totalNewCriticals": 9,
|
||||
"totalNewHigh": 15,
|
||||
"totalNewMedium": 25,
|
||||
"totalNewLow": 9
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `_id` combines `tenantId` and `scheduleId` (`tenant:schedule`).
|
||||
- `recent` contains the 20 most recent runs ordered by `createdAt` (UTC). Updates replace the existing entry for a run to respect state transitions.
|
||||
- `counters` aggregate over the retained window (20 runs) for quick trend indicators. Totals are recomputed after every update.
|
||||
- Schedulers should call the projection service after every run state change so the cache mirrors planner/runner progress.
|
||||
|
||||
Sample file: `samples/api/scheduler/run-summary.json`.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -65,6 +65,9 @@ graph LR
|
||||
`ops/devops/release/build_release.py` to build multi-arch images, attach
|
||||
CycloneDX SBOMs and SLSA provenance with Cosign, and emit
|
||||
`out/release/release.yaml` for downstream packaging (Helm, Compose, Offline Kit).
|
||||
The `build-test-deploy` workflow also runs
|
||||
`python ops/devops/release/test_verify_release.py` so release verifier
|
||||
regressions fail fast during every CI pass.
|
||||
|
||||
---
|
||||
|
||||
@@ -100,11 +103,12 @@ ouk fetch \
|
||||
--sign cosign.key
|
||||
```
|
||||
|
||||
### 4.2 Pipeline Hook
|
||||
|
||||
* Runs on **first Friday** each month (cron).
|
||||
* Generates tarball, signs it, uploads to **GitLab Release asset**.
|
||||
* SHA‑256 + signature published alongside.
|
||||
### 4.2 Pipeline Hook
|
||||
|
||||
* Runs on **first Friday** each month (cron).
|
||||
* Generates tarball, signs it, uploads to **GitLab Release asset**.
|
||||
* SHA‑256 + signature published alongside.
|
||||
* Release job must emit `out/release/debug/` with `debug-manifest.json` and `.sha256` so `ops/offline-kit/mirror_debug_store.py` can mirror symbols into the Offline Kit (see `DEVOPS-REL-17-004`).
|
||||
|
||||
### 4.3 Activation Flow (runtime)
|
||||
|
||||
@@ -123,12 +127,13 @@ CI job fails if token expiry < 29 days (guard against stale caches).
|
||||
|
||||
## 5 Artifact Signing & Transparency
|
||||
|
||||
| Artefact | Signer | Tool |
|
||||
| ------------ | --------------- | --------------------- |
|
||||
| Git tags | GPG (`0x90C4…`) | `git tag -s` |
|
||||
| Containers | Cosign key pair | `cosign sign` |
|
||||
| Helm Charts | prov file | `helm package --sign` |
|
||||
| OUK tarballs | Cosign | `cosign sign-blob` |
|
||||
| Artefact | Signer | Tool/Notes |
|
||||
| ------------ | --------------- | ---------------------------------- |
|
||||
| Git tags | GPG (`0x90C4…`) | `git tag -s` |
|
||||
| Containers | Cosign key pair | `cosign sign` |
|
||||
| Helm Charts | prov file | `helm package --sign` |
|
||||
| OUK tarballs | Cosign | `cosign sign-blob` |
|
||||
| Debug store | — | `debug/debug-manifest.json` hashed |
|
||||
|
||||
**Rekor** integration is **TODO** – once the internal Rekor mirror is online (`StellaOpsAttestor`) a post‑publish job will submit transparency log entries.
|
||||
|
||||
@@ -141,9 +146,20 @@ CI job fails if token expiry < 29 days (guard against stale caches).
|
||||
3. Tag `git tag -s X.Y.Z -m "Release X.Y.Z"` & push.
|
||||
4. GitLab CI auto‑publishes images & charts.
|
||||
5. Draft GitLab **Release Notes** using `tools/release-notes-gen`.
|
||||
6. Verify SBOM attachment with `stella sbom verify stella/backend:X.Y.Z`.
|
||||
7. Smoke‑test OUK tarball in offline lab.
|
||||
8. Announce in `#stella-release` Mattermost channel.
|
||||
6. Verify SBOM attachment with `stella sbom verify stella/backend:X.Y.Z`.
|
||||
7. Run the release verifier locally if CI isn’t available (mirrors the workflow step):
|
||||
`python ops/devops/release/test_verify_release.py`
|
||||
8. Mirror the release debug store into the Offline Kit staging tree and re-check the manifest:
|
||||
```bash
|
||||
./ops/offline-kit/mirror_debug_store.py \
|
||||
--release-dir out/release \
|
||||
--offline-kit-dir out/offline-kit
|
||||
jq '.artifacts | length' out/offline-kit/debug/debug-manifest.json
|
||||
readelf -n /app/... | grep -i 'Build ID'
|
||||
```
|
||||
Validate that the hash from `readelf` matches the `.build-id/<aa>/<rest>.debug` path created by the script.
|
||||
9. Smoke-test OUK tarball in offline lab.
|
||||
10. Announce in `#stella-release` Mattermost channel.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,70 +1,70 @@
|
||||
# Stella Ops — Installation Guide (Docker & Air‑Gap)
|
||||
|
||||
<!--
|
||||
This file is processed by the Eleventy build.
|
||||
Do **not** hard‑code versions or quota numbers; inherit from
|
||||
docs/_includes/CONSTANTS.md instead.
|
||||
{{ dotnet }} → ".NET 10 LTS"
|
||||
{{ angular }} → "20"
|
||||
-->
|
||||
|
||||
> **Status — public α not yet published.**
|
||||
> The commands below will work as soon as the first image is tagged
|
||||
> `registry.stella-ops.org/stella-ops/stella-ops:0.1.0-alpha`
|
||||
> (target date: **late 2025**). Track progress on the
|
||||
> [road‑map](/roadmap/).
|
||||
|
||||
---
|
||||
|
||||
## 0 · Prerequisites
|
||||
|
||||
| Item | Minimum | Notes |
|
||||
|------|---------|-------|
|
||||
| Linux | Ubuntu 22.04 LTS / Alma 9 | x86‑64 or arm64 |
|
||||
| CPU / RAM | 2 vCPU / 2 GiB | Laptop baseline |
|
||||
| Disk | 10 GiB SSD | SBOM + vuln DB cache |
|
||||
| Docker | **Engine 25 + Compose v2** | `docker -v` |
|
||||
| TLS | OpenSSL 1.1 + | Self‑signed cert generated at first run |
|
||||
|
||||
---
|
||||
|
||||
## 1 · Connected‑host install (Docker Compose)
|
||||
|
||||
```bash
|
||||
# 1. Make a working directory
|
||||
mkdir stella && cd stella
|
||||
|
||||
# 2. Download the signed Compose bundle + example .env
|
||||
curl -LO https://get.stella-ops.org/releases/latest/.env.example
|
||||
curl -LO https://get.stella-ops.org/releases/latest/.env.example.sig
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.infrastructure.yml
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.infrastructure.yml.sig
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.stella-ops.yml
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.stella-ops.yml.sig
|
||||
|
||||
# 3. Verify provenance (Cosign public key is stable)
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature .env.example.sig \
|
||||
.env.example
|
||||
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature docker-compose.infrastructure.yml.sig \
|
||||
docker-compose.infrastructure.yml
|
||||
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature docker-compose.stella-ops.yml.sig \
|
||||
docker-compose.stella-ops.yml
|
||||
|
||||
# 4. Copy .env.example → .env and edit secrets
|
||||
cp .env.example .env
|
||||
$EDITOR .env
|
||||
|
||||
# 5. Launch databases (MongoDB + Redis)
|
||||
docker compose --env-file .env -f docker-compose.infrastructure.yml up -d
|
||||
|
||||
# Stella Ops — Installation Guide (Docker & Air‑Gap)
|
||||
|
||||
<!--
|
||||
This file is processed by the Eleventy build.
|
||||
Do **not** hard‑code versions or quota numbers; inherit from
|
||||
docs/_includes/CONSTANTS.md instead.
|
||||
{{ dotnet }} → ".NET 10 LTS"
|
||||
{{ angular }} → "20"
|
||||
-->
|
||||
|
||||
> **Status — public α not yet published.**
|
||||
> The commands below will work as soon as the first image is tagged
|
||||
> `registry.stella-ops.org/stella-ops/stella-ops:0.1.0-alpha`
|
||||
> (target date: **late 2025**). Track progress on the
|
||||
> [road‑map](/roadmap/).
|
||||
|
||||
---
|
||||
|
||||
## 0 · Prerequisites
|
||||
|
||||
| Item | Minimum | Notes |
|
||||
|------|---------|-------|
|
||||
| Linux | Ubuntu 22.04 LTS / Alma 9 | x86‑64 or arm64 |
|
||||
| CPU / RAM | 2 vCPU / 2 GiB | Laptop baseline |
|
||||
| Disk | 10 GiB SSD | SBOM + vuln DB cache |
|
||||
| Docker | **Engine 25 + Compose v2** | `docker -v` |
|
||||
| TLS | OpenSSL 1.1 + | Self‑signed cert generated at first run |
|
||||
|
||||
---
|
||||
|
||||
## 1 · Connected‑host install (Docker Compose)
|
||||
|
||||
```bash
|
||||
# 1. Make a working directory
|
||||
mkdir stella && cd stella
|
||||
|
||||
# 2. Download the signed Compose bundle + example .env
|
||||
curl -LO https://get.stella-ops.org/releases/latest/.env.example
|
||||
curl -LO https://get.stella-ops.org/releases/latest/.env.example.sig
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.infrastructure.yml
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.infrastructure.yml.sig
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.stella-ops.yml
|
||||
curl -LO https://get.stella-ops.org/releases/latest/docker-compose.stella-ops.yml.sig
|
||||
|
||||
# 3. Verify provenance (Cosign public key is stable)
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature .env.example.sig \
|
||||
.env.example
|
||||
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature docker-compose.infrastructure.yml.sig \
|
||||
docker-compose.infrastructure.yml
|
||||
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature docker-compose.stella-ops.yml.sig \
|
||||
docker-compose.stella-ops.yml
|
||||
|
||||
# 4. Copy .env.example → .env and edit secrets
|
||||
cp .env.example .env
|
||||
$EDITOR .env
|
||||
|
||||
# 5. Launch databases (MongoDB + Redis)
|
||||
docker compose --env-file .env -f docker-compose.infrastructure.yml up -d
|
||||
|
||||
# 6. Launch Stella Ops (first run pulls ~50 MB merged vuln DB)
|
||||
docker compose --env-file .env -f docker-compose.stella-ops.yml up -d
|
||||
````
|
||||
@@ -95,7 +95,13 @@ The Concelier container reads configuration from `etc/concelier.yaml` plus
|
||||
CONCELIER_AUTHORITY__ISSUER="https://authority.internal"
|
||||
CONCELIER_AUTHORITY__AUDIENCES__0="api://concelier"
|
||||
CONCELIER_AUTHORITY__REQUIREDSCOPES__0="concelier.jobs.trigger"
|
||||
CONCELIER_AUTHORITY__REQUIREDSCOPES__1="advisory:read"
|
||||
CONCELIER_AUTHORITY__REQUIREDSCOPES__2="advisory:ingest"
|
||||
CONCELIER_AUTHORITY__REQUIREDTENANTS__0="tenant-default"
|
||||
CONCELIER_AUTHORITY__CLIENTID="concelier-jobs"
|
||||
CONCELIER_AUTHORITY__CLIENTSCOPES__0="concelier.jobs.trigger"
|
||||
CONCELIER_AUTHORITY__CLIENTSCOPES__1="advisory:read"
|
||||
CONCELIER_AUTHORITY__CLIENTSCOPES__2="advisory:ingest"
|
||||
CONCELIER_AUTHORITY__CLIENTSECRETFILE="/run/secrets/concelier_authority_client"
|
||||
CONCELIER_AUTHORITY__BYPASSNETWORKS__0="127.0.0.1/32"
|
||||
CONCELIER_AUTHORITY__BYPASSNETWORKS__1="::1/128"
|
||||
@@ -132,53 +138,53 @@ The Concelier container reads configuration from `etc/concelier.yaml` plus
|
||||
---
|
||||
|
||||
## 2 · Optional: request a free quota token
|
||||
|
||||
Anonymous installs allow **{{ quota\_anon }} scans per UTC day**.
|
||||
Email `token@stella-ops.org` to receive a signed JWT that raises the limit to
|
||||
**{{ quota\_token }} scans/day**. Insert it into `.env`:
|
||||
|
||||
```bash
|
||||
STELLA_JWT="paste‑token‑here"
|
||||
docker compose --env-file .env -f docker-compose.stella-ops.yml \
|
||||
exec stella-ops stella set-jwt "$STELLA_JWT"
|
||||
```
|
||||
|
||||
> The UI shows a reminder at 200 scans and throttles above the limit but will
|
||||
> **never block** your pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Air‑gapped install (Offline Update Kit)
|
||||
|
||||
When running on an isolated network use the **Offline Update Kit (OUK)**:
|
||||
|
||||
```bash
|
||||
# Download & verify on a connected host
|
||||
curl -LO https://get.stella-ops.org/ouk/stella-ops-offline-kit-v0.1a.tgz
|
||||
curl -LO https://get.stella-ops.org/ouk/stella-ops-offline-kit-v0.1a.tgz.sig
|
||||
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature stella-ops-offline-kit-v0.1a.tgz.sig \
|
||||
stella-ops-offline-kit-v0.1a.tgz
|
||||
|
||||
# Transfer → air‑gap → import
|
||||
docker compose --env-file .env -f docker-compose.stella-ops.yml \
|
||||
exec stella admin import-offline-usage-kit stella-ops-offline-kit-v0.1a.tgz
|
||||
```
|
||||
|
||||
*Import is atomic; no service downtime.*
|
||||
|
||||
For details see the dedicated [Offline Kit guide](/offline/).
|
||||
|
||||
---
|
||||
|
||||
## 4 · Next steps
|
||||
|
||||
* **5‑min Quick‑Start:** `/quickstart/`
|
||||
* **CI recipes:** `docs/ci/20_CI_RECIPES.md`
|
||||
* **Plug‑in SDK:** `/plugins/`
|
||||
|
||||
---
|
||||
|
||||
*Generated {{ "now" | date: "%Y‑%m‑%d" }} — build tags inserted at render time.*
|
||||
|
||||
Anonymous installs allow **{{ quota\_anon }} scans per UTC day**.
|
||||
Email `token@stella-ops.org` to receive a signed JWT that raises the limit to
|
||||
**{{ quota\_token }} scans/day**. Insert it into `.env`:
|
||||
|
||||
```bash
|
||||
STELLA_JWT="paste‑token‑here"
|
||||
docker compose --env-file .env -f docker-compose.stella-ops.yml \
|
||||
exec stella-ops stella set-jwt "$STELLA_JWT"
|
||||
```
|
||||
|
||||
> The UI shows a reminder at 200 scans and throttles above the limit but will
|
||||
> **never block** your pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Air‑gapped install (Offline Update Kit)
|
||||
|
||||
When running on an isolated network use the **Offline Update Kit (OUK)**:
|
||||
|
||||
```bash
|
||||
# Download & verify on a connected host
|
||||
curl -LO https://get.stella-ops.org/ouk/stella-ops-offline-kit-v0.1a.tgz
|
||||
curl -LO https://get.stella-ops.org/ouk/stella-ops-offline-kit-v0.1a.tgz.sig
|
||||
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature stella-ops-offline-kit-v0.1a.tgz.sig \
|
||||
stella-ops-offline-kit-v0.1a.tgz
|
||||
|
||||
# Transfer → air‑gap → import
|
||||
docker compose --env-file .env -f docker-compose.stella-ops.yml \
|
||||
exec stella admin import-offline-usage-kit stella-ops-offline-kit-v0.1a.tgz
|
||||
```
|
||||
|
||||
*Import is atomic; no service downtime.*
|
||||
|
||||
For details see the dedicated [Offline Kit guide](/offline/).
|
||||
|
||||
---
|
||||
|
||||
## 4 · Next steps
|
||||
|
||||
* **5‑min Quick‑Start:** `/quickstart/`
|
||||
* **CI recipes:** `docs/ci/20_CI_RECIPES.md`
|
||||
* **Plug‑in SDK:** `/plugins/`
|
||||
|
||||
---
|
||||
|
||||
*Generated {{ "now" | date: "%Y‑%m‑%d" }} — build tags inserted at render time.*
|
||||
|
||||
@@ -18,6 +18,8 @@ completely isolated network:
|
||||
| **Attested manifest** | `offline-manifest.json` + detached JWS covering bundle metadata, signed during export. |
|
||||
| **Delta patches** | Daily diff bundles keep size \< 350 MB |
|
||||
| **Scanner plug-ins** | OS analyzers plus the Node.js, Go, .NET, and Python language analyzers packaged under `plugins/scanner/analyzers/**` with manifests so Workers load deterministically offline. |
|
||||
| **Debug store** | `.debug` artefacts laid out under `debug/.build-id/<aa>/<rest>.debug` with `debug/debug-manifest.json` mapping build-ids to originating images for symbol retrieval. |
|
||||
| **Telemetry collector bundle** | `telemetry/telemetry-offline-bundle.tar.gz` plus `.sha256`, containing OTLP collector config, Helm/Compose overlays, and operator instructions. |
|
||||
|
||||
**RU BDU note:** ship the official Russian Trusted Root/Sub CA bundle (`certificates/russian_trusted_bundle.pem`) inside the kit so `concelier:httpClients:source.bdu:trustedRootPaths` can resolve it when the service runs in an air‑gapped network. Drop the most recent `vulxml.zip` alongside the kit if operators need a cold-start cache.
|
||||
|
||||
@@ -25,11 +27,53 @@ completely isolated network:
|
||||
|
||||
*Scanner core:* C# 12 on **.NET {{ dotnet }}**.
|
||||
*Imports are idempotent and atomic — no service downtime.*
|
||||
|
||||
---
|
||||
|
||||
## 1 · Download & verify
|
||||
|
||||
|
||||
## 0 · Prepare the debug store
|
||||
|
||||
Before packaging the Offline Kit, mirror the release debug artefacts (GNU build-id `.debug` files and the associated manifest) into the staging directory:
|
||||
|
||||
```bash
|
||||
./ops/offline-kit/mirror_debug_store.py \
|
||||
--release-dir out/release \
|
||||
--offline-kit-dir out/offline-kit
|
||||
```
|
||||
|
||||
The helper copies `debug/.build-id/**`, validates `debug/debug-manifest.json` against its recorded SHA-256, and writes `out/offline-kit/metadata/debug-store.json` with a short summary (platforms, artefact counts, sample build-ids). The command exits non-zero if an artefact referenced by the manifest is missing or has the wrong digest, so run it as part of every kit build.
|
||||
|
||||
---
|
||||
|
||||
## 0.1 · Automated packaging
|
||||
|
||||
The packaging workflow is scripted via `ops/offline-kit/build_offline_kit.py`.
|
||||
It verifies the release artefacts, runs the Python analyzer smoke suite, mirrors the debug store, and emits a deterministic tarball + manifest set.
|
||||
|
||||
```bash
|
||||
python ops/offline-kit/build_offline_kit.py \
|
||||
--version 2025.10.0 \
|
||||
--channel edge \
|
||||
--release-dir out/release \
|
||||
--staging-dir out/offline-kit/staging \
|
||||
--output-dir out/offline-kit/dist
|
||||
|
||||
# Optional: regenerate the telemetry collector bundle prior to packaging.
|
||||
python ops/devops/telemetry/package_offline_bundle.py --output out/telemetry/telemetry-offline-bundle.tar.gz
|
||||
```
|
||||
|
||||
Outputs:
|
||||
|
||||
- `stella-ops-offline-kit-<version>-<channel>.tar.gz` — bundle (mtime/uid/gid forced to zero for reproducibility)
|
||||
- `stella-ops-offline-kit-<version>-<channel>.tar.gz.sha256` — bundle digest
|
||||
- `manifest/offline-manifest.json` + `.sha256` — inventories every file in the bundle
|
||||
- `<bundle>.metadata.json` — descriptor consumed by the CLI/Console import tooling
|
||||
- `telemetry/telemetry-offline-bundle.tar.gz` + `.sha256` — packaged OTLP collector assets for environments without upstream access
|
||||
- `plugins/scanner/analyzers/lang/StellaOps.Scanner.Analyzers.Lang.Python/*.sig` (+ `.sha256`) — Cosign signatures for the Python analyzer DLL and manifest
|
||||
|
||||
Provide `--cosign-key` / `--cosign-identity-token` (and optional `--cosign-password`) to generate Cosign signatures for both the tarball and manifest.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Download & verify
|
||||
|
||||
```bash
|
||||
curl -LO https://get.stella-ops.org/ouk/stella-ops-offline-kit-<DATE>.tgz
|
||||
curl -LO https://get.stella-ops.org/ouk/stella-ops-offline-kit-<DATE>.tgz.sig
|
||||
@@ -101,21 +145,21 @@ Example excerpt (2025-10-23 kit) showing the Go and .NET analyzer plug-in payloa
|
||||
}
|
||||
{
|
||||
"name": "plugins/scanner/analyzers/lang/StellaOps.Scanner.Analyzers.Lang.Python/StellaOps.Scanner.Analyzers.Lang.Python.dll",
|
||||
"sha256": "28b6e06c7cabf3b78f13f801cbb14962093f3d42c4ae9ec01babbcd14cda4644",
|
||||
"size": 53760,
|
||||
"capturedAt": "2025-10-23T00:00:00Z"
|
||||
"sha256": "a4f558f363394096e3dd6263f35b180b93b4112f9cf616c05872da8a8657d518",
|
||||
"size": 47104,
|
||||
"capturedAt": "2025-10-26T00:00:00Z"
|
||||
}
|
||||
{
|
||||
"name": "plugins/scanner/analyzers/lang/StellaOps.Scanner.Analyzers.Lang.Python/StellaOps.Scanner.Analyzers.Lang.Python.pdb",
|
||||
"sha256": "be4e34b4dc9a790fe1299e84213343b7c8ea90a2d22e5d7d1aa7585b8fedc946",
|
||||
"size": 34516,
|
||||
"capturedAt": "2025-10-23T00:00:00Z"
|
||||
"sha256": "ef2ad78bc2cd1d7e99bae000b92357aa9a9c32938501899e9033d001096196d0",
|
||||
"size": 31896,
|
||||
"capturedAt": "2025-10-26T00:00:00Z"
|
||||
}
|
||||
{
|
||||
"name": "plugins/scanner/analyzers/lang/StellaOps.Scanner.Analyzers.Lang.Python/manifest.json",
|
||||
"sha256": "bceea1e7542aae860b0ec5ba7b8b3aa960b21edc4d1efe60afc98ce289341ac3",
|
||||
"size": 671,
|
||||
"capturedAt": "2025-10-23T00:00:00Z"
|
||||
"sha256": "668ad9a1a35485628677b639db4d996d1e25f62021680a81a22482483800e557",
|
||||
"size": 648,
|
||||
"capturedAt": "2025-10-26T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
@@ -153,6 +197,21 @@ tar -tzf stella-ops-offline-kit-<DATE>.tgz 'plugins/scanner/analyzers/lang/Stell
|
||||
|
||||
The manifest lookup above and this `tar` listing should both surface the Go analyzer DLL, PDB, and manifest entries before the kit is promoted.
|
||||
|
||||
> **Release guardrail.** The automated release pipeline now publishes the Python plug-in from source and executes `dotnet run --project tools/LanguageAnalyzerSmoke --configuration Release -- --repo-root <checkout>` to validate manifest integrity and cold/warm determinism within the < 30 s / < 5 s budgets (differences versus repository goldens are logged for triage). Run `ops/offline-kit/run-python-analyzer-smoke.sh` locally before shipping a refreshed kit if you rebuild artefacts outside CI or when preparing the air-gap bundle.
|
||||
|
||||
### Debug store mirror
|
||||
|
||||
Offline symbols (`debug/.build-id/**`) must accompany every Offline Kit to keep symbol lookup deterministic. The release workflow is expected to emit `out/release/debug/` containing the build-id tree plus `debug-manifest.json` and its `.sha256` companion. After a release completes:
|
||||
|
||||
```bash
|
||||
python ops/offline-kit/mirror_debug_store.py \
|
||||
--release-dir out/release \
|
||||
--offline-dir out/offline-kit \
|
||||
--summary out/offline-kit/metadata/debug-store.json
|
||||
```
|
||||
|
||||
The script mirrors the debug tree into the Offline Kit staging directory, verifies SHA-256 values against the manifest, and writes a summary under `metadata/debug-store.json` for audit logs. If the release pipeline does not populate `out/release/debug`, the tooling now logs a warning (`DEVOPS-REL-17-004`)—treat it as a build failure and re-run the release once symbol extraction is enabled.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Delta patch workflow
|
||||
|
||||
@@ -123,6 +123,23 @@ details // structured conflict explanation / merge reasoning
|
||||
- Conflict explainers are serialized as deterministic `MergeConflictExplainerPayload` records (type, reason, source ranks, winning values); replay clients can parse the payload to render human-readable rationales without re-computing precedence.
|
||||
- Concelier.WebService exposes the immutable log via `GET /concelier/advisories/{vulnerabilityKey}/replay[?asOf=UTC_ISO8601]`, returning the latest statements (with hex-encoded hashes) and any conflict explanations for downstream exporters and APIs.
|
||||
|
||||
**AdvisoryObservation (new in Sprint 24)**
|
||||
|
||||
```
|
||||
observationId // deterministic id: {tenant}:{source}:{upstreamId}:{revision}
|
||||
tenant // issuing tenant (lower-case)
|
||||
source{vendor,stream,api,collectorVersion}
|
||||
upstream{
|
||||
upstreamId, documentVersion, contentHash,
|
||||
fetchedAt, receivedAt, signature{present,format,keyId,signature}}
|
||||
content{format,specVersion,raw,metadata}
|
||||
linkset{aliases[], purls[], cpes[], references[{type,url}]}
|
||||
createdAt // when Concelier recorded the observation
|
||||
attributes // optional provenance metadata (e.g., batch, connector)
|
||||
```
|
||||
|
||||
The observation is an immutable projection of the raw ingestion document (post provenance validation, pre-merge) that powers Link‑Not‑Merge overlays and Vuln Explorer. Observations live in the `advisory_observations` collection, keyed by tenant + upstream identity. `linkset` provides normalized aliases/PURLs/CPES that downstream services (Graph/Vuln Explorer) join against without triggering merge logic. Concelier.Core exposes strongly-typed models (`AdvisoryObservation`, `AdvisoryObservationLinkset`, etc.) and a Mongo-backed store for filtered queries by tenant/alias; this keeps overlay consumers read-only while preserving AOC guarantees.
|
||||
|
||||
**ExportState**
|
||||
|
||||
```
|
||||
|
||||
@@ -76,8 +76,14 @@ At startup, services **self‑advertise** their semver & channel; the UI surface
|
||||
* **Unit/integration**: per‑component, plus **end‑to‑end** flows (scan→vex→policy→sign→attest).
|
||||
* **Perf SLOs**: hot paths (SBOM compose, diff, export) measured against budgets.
|
||||
* **Security**: dependency audit vs Concelier export; container hardening tests; minimal caps.
|
||||
* **Analyzer smoke**: restart-time language plug-ins (currently Python) verified via `dotnet run --project tools/LanguageAnalyzerSmoke` to ensure manifest integrity plus cold vs warm determinism (< 30 s / < 5 s budgets); the harness logs deviations from repository goldens for follow-up.
|
||||
* **Canary cohort**: internal staging + selected customers; one week on **edge** before **stable** tag.
|
||||
|
||||
### 2.5 Debug-store artefacts
|
||||
|
||||
* Every release exports stripped debug information for ELF binaries discovered in service images. Debug files follow the GNU build-id layout (`debug/.build-id/<aa>/<rest>.debug`) and are generated via `objcopy --only-keep-debug`.
|
||||
* `debug/debug-manifest.json` captures build-id → component/image/source mappings with SHA-256 checksums so operators can mirror the directory into debuginfod or offline symbol stores. The manifest (and its `.sha256` companion) ships with every release bundle and Offline Kit.
|
||||
|
||||
---
|
||||
|
||||
## 3) Distribution & activation
|
||||
@@ -92,7 +98,7 @@ At startup, services **self‑advertise** their semver & channel; the UI surface
|
||||
**Gating policy**:
|
||||
|
||||
* **Core images** (Authority, Scanner, Concelier, Excititor, Attestor, UI): public **read**.
|
||||
* **Enterprise add‑ons** (if any) and **pre‑release**: private repos via OAuth2 token service.
|
||||
* **Enterprise add‑ons** (if any) and **pre‑release**: private repos via the **Registry Token Service** (`src/StellaOps.Registry.TokenService`) which exchanges Authority-issued OpToks for short-lived Docker registry bearer tokens.
|
||||
|
||||
> Monetization lever is **signing** (PoE gate), not image pulls, so the core remains simple to consume.
|
||||
|
||||
@@ -105,6 +111,8 @@ At startup, services **self‑advertise** their semver & channel; the UI surface
|
||||
3. Registry allows pull for the requested repo.
|
||||
* Tokens are **short‑lived** (60–300 s) and **DPoP‑bound**.
|
||||
|
||||
The token service enforces plan gating via `registry-token.yaml` (see `docs/ops/registry-token-service.md`) and exposes Prometheus metrics (`registry_token_issued_total`, `registry_token_rejected_total`). Revoked licence identifiers halt issuance even when scope requirements are met.
|
||||
|
||||
### 3.3 Offline kits (air‑gapped)
|
||||
|
||||
* Tarball per release channel:
|
||||
|
||||
@@ -242,12 +242,15 @@ When `scanner.events.enabled = true`, the WebService serialises the signed repor
|
||||
|
||||
### 5.5 SBOM assembly & emit
|
||||
|
||||
* **Per‑layer SBOM fragments**: components introduced by the layer (+ relationships).
|
||||
* **Per-layer SBOM fragments**: components introduced by the layer (+ relationships).
|
||||
* **Image SBOMs**: merge fragments; refer back to them via **CycloneDX BOM‑Link** (or SPDX ExternalRef).
|
||||
* Emit both **Inventory** & **Usage** views.
|
||||
* When the native analyzer reports an ELF `buildId`, attach it to component metadata and surface it as `stellaops:buildId` in CycloneDX properties (and diff metadata). This keeps SBOM/diff output in lockstep with runtime events and the debug-store manifest.
|
||||
* Serialize **CycloneDX JSON** and **CycloneDX Protobuf**; optionally **SPDX 3.0.1 JSON**.
|
||||
* Build **BOM‑Index** sidecar: purl table + roaring bitmap; flag `usedByEntrypoint` components for fast backend joins.
|
||||
|
||||
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
|
||||
|
||||
### 5.6 DSSE attestation (via Signer/Attestor)
|
||||
|
||||
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
|
||||
|
||||
@@ -141,10 +141,10 @@ stellaops/zastava-agent # System service; watch Docker events; observer on
|
||||
* Image signature presence (if cosign policies are local; else ask backend).
|
||||
* SBOM **referrers** presence (HEAD to registry, optional).
|
||||
* Rekor UUID known (query Scanner.WebService by image digest).
|
||||
* **Publish runtime events** to Scanner.WebService `/runtime/events` (batch & compress).
|
||||
* **Request delta scan** if: no SBOM in catalog OR base differs from known baseline.
|
||||
|
||||
### 3.2 Privileges & mounts (K8s)
|
||||
* **Publish runtime events** to Scanner.WebService `/runtime/events` (batch & compress).
|
||||
* **Request delta scan** if: no SBOM in catalog OR base differs from known baseline.
|
||||
|
||||
### 3.2 Privileges & mounts (K8s)
|
||||
|
||||
* **SecurityContext:** `runAsUser: 0`, `readOnlyRootFilesystem: true`, `allowPrivilegeEscalation: false`.
|
||||
* **Capabilities:** `CAP_SYS_PTRACE` (optional if using nsenter trace), `CAP_DAC_READ_SEARCH`.
|
||||
@@ -154,12 +154,22 @@ stellaops/zastava-agent # System service; watch Docker events; observer on
|
||||
* `/run/containerd/containerd.sock` (or CRI‑O socket)
|
||||
* `/var/lib/containerd/io.containerd.runtime.v2.task` (rootfs paths & pids)
|
||||
* **Networking:** cluster‑internal egress to Scanner.WebService only.
|
||||
* **Rate limits:** hard caps for bytes hashed and file count per container to avoid noisy tenants.
|
||||
|
||||
### 3.3 Event batching
|
||||
|
||||
* Buffer ND‑JSON; flush by **N events** or **2 s**.
|
||||
* Backpressure: local disk ring buffer (50 MB default) if Scanner is temporarily unavailable; drop oldest after cap with **metrics** and **warning** event.
|
||||
* **Rate limits:** hard caps for bytes hashed and file count per container to avoid noisy tenants.
|
||||
|
||||
### 3.3 Event batching
|
||||
|
||||
* Buffer ND‑JSON; flush by **N events** or **2 s**.
|
||||
* Backpressure: local disk ring buffer (50 MB default) if Scanner is temporarily unavailable; drop oldest after cap with **metrics** and **warning** event.
|
||||
|
||||
### 3.4 Build-id capture & validation workflow
|
||||
|
||||
1. When Observer sees a `CONTAINER_START` it dereferences `/proc/<pid>/exe`, extracts the `NT_GNU_BUILD_ID` note, normalises it to lower-case hex, and sends it as `process.buildId` in the runtime envelope.
|
||||
2. Scanner.WebService persists the observation and propagates the most recent hashes into `/policy/runtime` responses (`buildIds` list) and policy caches consumed by the webhook/CLI.
|
||||
3. Release engineering copies the matching `.debug` files into the bundle (`debug/.build-id/<aa>/<rest>.debug`) and publishes `debug/debug-manifest.json` with per-hash digests. Offline Kit packaging reuses those artefacts verbatim (see `ops/offline-kit/mirror_debug_store.py`).
|
||||
4. Operators resolve symbols by either:
|
||||
* calling `stellaops-cli runtime policy test --image <digest>` to read the current `buildIds` and then fetching the corresponding `.debug` file from the bundle/offline mirror, or
|
||||
* piping the hash into `debuginfod-find debuginfo <buildId>` when a `debuginfod` service is wired against the mirrored tree.
|
||||
5. Missing hashes indicate stripped binaries without GNU notes; operators should trigger a rebuild with `-Wl,--build-id` or register a fallback symbol package as described in the runtime operations runbook.
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -36,6 +36,7 @@ Everything here is open‑source and versioned — when you check out a git ta
|
||||
- **07 – [High‑Level Architecture](07_HIGH_LEVEL_ARCHITECTURE.md)**
|
||||
- **08 – [Architecture Decision Records](adr/index.md)**
|
||||
- **08 – Module Architecture Dossiers**
|
||||
- [Architecture Overview](architecture/overview.md)
|
||||
- [Scanner](ARCHITECTURE_SCANNER.md)
|
||||
- [Concelier](ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor](ARCHITECTURE_EXCITITOR.md)
|
||||
@@ -43,6 +44,7 @@ Everything here is open‑source and versioned — when you check out a git ta
|
||||
- [Signer](ARCHITECTURE_SIGNER.md)
|
||||
- [Attestor](ARCHITECTURE_ATTESTOR.md)
|
||||
- [Authority](ARCHITECTURE_AUTHORITY.md)
|
||||
- [Policy Engine](architecture/policy-engine.md)
|
||||
- [Notify](ARCHITECTURE_NOTIFY.md)
|
||||
- [Scheduler](ARCHITECTURE_SCHEDULER.md)
|
||||
- [CLI](ARCHITECTURE_CLI.md)
|
||||
@@ -55,17 +57,32 @@ Everything here is open‑source and versioned — when you check out a git ta
|
||||
- **10 – [BuildX Generator Quickstart](dev/BUILDX_PLUGIN_QUICKSTART.md)**
|
||||
- **10 – [Scanner Cache Configuration](dev/SCANNER_CACHE_CONFIGURATION.md)**
|
||||
- **30 – [Excititor Connector Packaging Guide](dev/30_EXCITITOR_CONNECTOR_GUIDE.md)**
|
||||
- **31 – [Aggregation-Only Contract Reference](ingestion/aggregation-only-contract.md)**
|
||||
- **30 – Developer Templates**
|
||||
- [Excititor Connector Skeleton](dev/templates/excititor-connector/)
|
||||
- **11 – [Authority Service](11_AUTHORITY.md)**
|
||||
- **11 – [Data Schemas](11_DATA_SCHEMAS.md)**
|
||||
- **12 – [Performance Workbook](12_PERFORMANCE_WORKBOOK.md)**
|
||||
- **13 – [Release‑Engineering Playbook](13_RELEASE_ENGINEERING_PLAYBOOK.md)**
|
||||
- **20 – [CLI AOC Commands Reference](cli/cli-reference.md)**
|
||||
- **60 – [Policy Engine Overview](policy/overview.md)**
|
||||
- **61 – [Policy DSL Grammar](policy/dsl.md)**
|
||||
- **62 – [Policy Lifecycle & Approvals](policy/lifecycle.md)**
|
||||
- **63 – [Policy Runs & Orchestration](policy/runs.md)**
|
||||
- **64 – [Policy Engine REST API](api/policy.md)**
|
||||
- **65 – [Policy CLI Guide](cli/policy.md)**
|
||||
- **66 – [Policy Editor Workspace](ui/policy-editor.md)**
|
||||
- **67 – [Policy Observability](observability/policy.md)**
|
||||
- **68 – [Policy Governance & Least Privilege](security/policy-governance.md)**
|
||||
- **69 – [Policy Examples](examples/policies/README.md)**
|
||||
- **70 – [Policy FAQ](faq/policy-faq.md)**
|
||||
- **71 – [Policy Run DTOs](../src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-20-001-POLICY-RUNS.md)**
|
||||
- **30 – [Fixture Maintenance](dev/fixtures.md)**
|
||||
|
||||
### User & operator guides
|
||||
- **14 – [Glossary](14_GLOSSARY_OF_TERMS.md)**
|
||||
- **15 – [UI Guide](15_UI_GUIDE.md)**
|
||||
- **16 – [Console AOC Dashboard](ui/console.md)**
|
||||
- **17 – [Security Hardening Guide](17_SECURITY_HARDENING_GUIDE.md)**
|
||||
- **18 – [Coding Standards](18_CODING_STANDARDS.md)**
|
||||
- **19 – [Test‑Suite Overview](19_TEST_SUITE_OVERVIEW.md)**
|
||||
@@ -80,9 +97,19 @@ Everything here is open‑source and versioned — when you check out a git ta
|
||||
- **29 – [Concelier CISA ICS Connector Operations](ops/concelier-icscisa-operations.md)**
|
||||
- **30 – [Concelier CERT-Bund Connector Operations](ops/concelier-certbund-operations.md)**
|
||||
- **31 – [Concelier MSRC Connector – AAD Onboarding](ops/concelier-msrc-operations.md)**
|
||||
- **32 – [Scanner Analyzer Bench Operations](ops/scanner-analyzers-operations.md)**
|
||||
- **33 – [Scanner Artifact Store Migration](ops/scanner-rustfs-migration.md)**
|
||||
- **34 – [Zastava Runtime Operations Runbook](ops/zastava-runtime-operations.md)**
|
||||
- **32 – [Scanner Analyzer Bench Operations](ops/scanner-analyzers-operations.md)**
|
||||
- **33 – [Scanner Artifact Store Migration](ops/scanner-rustfs-migration.md)**
|
||||
- **34 – [Zastava Runtime Operations Runbook](ops/zastava-runtime-operations.md)**
|
||||
- **35 – [Launch Readiness Checklist](ops/launch-readiness.md)**
|
||||
- **36 – [Launch Cutover Runbook](ops/launch-cutover.md)**
|
||||
- **37 – [Registry Token Service](ops/registry-token-service.md)**
|
||||
- **37 – [Deployment Upgrade & Rollback Runbook](ops/deployment-upgrade-runbook.md)**
|
||||
- **38 – [Policy Schema Export Automation](devops/policy-schema-export.md)**
|
||||
- **40 – [Observability Guide (AOC)](observability/observability.md)**
|
||||
- **41 – [Telemetry Collector Deployment](ops/telemetry-collector.md)**
|
||||
- **42 – [Telemetry Storage Deployment](ops/telemetry-storage.md)**
|
||||
- **43 – [Authority Scopes & Tenancy](security/authority-scopes.md)**
|
||||
- **44 – [Container Deployment (AOC)](deploy/containers.md)**
|
||||
|
||||
### Legal & licence
|
||||
- **32 – [Legal & Quota FAQ](29_LEGAL_FAQ_QUOTA.md)**
|
||||
|
||||
@@ -16,8 +16,9 @@
|
||||
| PLATFORM-EVENTS-09-401 | DONE (2025-10-21) | Platform Events Guild | DOCS-EVENTS-09-003 | Embed canonical event samples into contract/integration tests and ensure CI validates payloads against published schemas. | Notify models tests now run schema validation against `docs/events/*.json`, event schemas allow optional `attributes`, and docs capture the new validation workflow. |
|
||||
| RUNTIME-GUILD-09-402 | DONE (2025-10-19) | Runtime Guild | SCANNER-POLICY-09-107 | Confirm Scanner WebService surfaces `quietedFindingCount` and progress hints to runtime consumers; document readiness checklist. | Runtime verification run captures enriched payload; checklist/doc updates merged; stakeholders acknowledge availability. |
|
||||
| DOCS-CONCELIER-07-201 | DONE (2025-10-22) | Docs Guild, Concelier WebService | FEEDWEB-DOCS-01-001 | Final editorial review and publish pass for Concelier authority toggle documentation (Quickstart + operator guide). | Review feedback resolved, publish PR merged, release notes updated with documentation pointer. |
|
||||
| DOCS-RUNTIME-17-004 | TODO | Docs Guild, Runtime Guild | SCANNER-EMIT-17-701, ZASTAVA-OBS-17-005, DEVOPS-REL-17-002 | Document build-id workflows: SBOM exposure, runtime event payloads (`process.buildId`), Scanner `/policy/runtime` response (`buildIds` list), debug-store layout, and operator guidance for symbol retrieval. | Architecture + operator docs updated with build-id sections (Observer, Scanner, CLI), examples show `readelf` output + debuginfod usage, references linked from Offline Kit/Release guides + CLI help. |
|
||||
| DOCS-OBS-50-001 | TODO | Docs Guild, Observability Guild | TELEMETRY-OBS-50-001 | Publish `/docs/observability/overview.md` introducing scope, imposed rule banner, architecture diagram, and tenant guarantees. | Doc merged with imposed rule banner; diagram committed; cross-links to telemetry stack + evidence locker docs. |
|
||||
| DOCS-RUNTIME-17-004 | DONE (2025-10-26) | Docs Guild, Runtime Guild | SCANNER-EMIT-17-701, ZASTAVA-OBS-17-005, DEVOPS-REL-17-002 | Document build-id workflows: SBOM exposure, runtime event payloads (`process.buildId`), Scanner `/policy/runtime` response (`buildIds` list), debug-store layout, and operator guidance for symbol retrieval. | Architecture + operator docs updated with build-id sections (Observer, Scanner, CLI), examples show `readelf` output + debuginfod usage, references linked from Offline Kit/Release guides + CLI help. |
|
||||
| DOCS-OBS-50-001 | BLOCKED (2025-10-26) | Docs Guild, Observability Guild | TELEMETRY-OBS-50-001 | Publish `/docs/observability/overview.md` introducing scope, imposed rule banner, architecture diagram, and tenant guarantees. | Doc merged with imposed rule banner; diagram committed; cross-links to telemetry stack + evidence locker docs. |
|
||||
> Blocked: waiting on telemetry core deliverable (TELEMETRY-OBS-50-001) to finalise architecture details and diagrams.
|
||||
| DOCS-OBS-50-002 | TODO | Docs Guild, Security Guild | TELEMETRY-OBS-50-002 | Author `/docs/observability/telemetry-standards.md` detailing common fields, scrubbing policy, sampling defaults, and redaction override procedure. | Doc merged; imposed rule banner present; examples validated with telemetry fixtures; security review sign-off captured. |
|
||||
| DOCS-OBS-50-003 | TODO | Docs Guild, Observability Guild | TELEMETRY-OBS-50-001 | Create `/docs/observability/logging.md` covering structured log schema, dos/don'ts, tenant isolation, and copyable examples. | Doc merged with banner; sample logs redacted; lint passes; linked from coding standards. |
|
||||
| DOCS-OBS-50-004 | TODO | Docs Guild, Observability Guild | TELEMETRY-OBS-50-002 | Draft `/docs/observability/tracing.md` explaining context propagation, async linking, CLI header usage, and sampling strategies. | Doc merged; imposed rule banner included; diagrams updated; references to CLI/Console features added. |
|
||||
@@ -32,14 +33,17 @@
|
||||
| DOCS-CLI-OBS-52-001 | TODO | Docs Guild, DevEx/CLI Guild | CLI-OBS-52-001 | Create `/docs/cli/observability.md` detailing `stella obs` commands, examples, exit codes, imposed rule banner, and scripting tips. | Doc merged; examples tested; banner included; CLI parity matrix updated. |
|
||||
| DOCS-CLI-FORENSICS-53-001 | TODO | Docs Guild, DevEx/CLI Guild | CLI-FORENSICS-54-001 | Publish `/docs/cli/forensics.md` for snapshot/verify/attest commands with sample outputs, imposed rule banner, and offline workflows. | Doc merged; sample bundles verified; banner present; offline notes cross-linked. |
|
||||
| DOCS-RUNBOOK-55-001 | TODO | Docs Guild, Ops Guild | DEVOPS-OBS-55-001, WEB-OBS-55-001 | Author `/docs/runbooks/incidents.md` describing incident mode activation, escalation steps, retention impact, verification checklist, and imposed rule banner. | Doc merged; runbook rehearsed; banner included; linked from alerts. |
|
||||
| DOCS-AOC-19-001 | TODO | Docs Guild, Concelier Guild | CONCELIER-WEB-AOC-19-001, EXCITITOR-WEB-AOC-19-001 | Author `/docs/ingestion/aggregation-only-contract.md` covering philosophy, invariants, schemas, error codes, migration, observability, and security checklist. | New doc published with compliance checklist; cross-links from existing docs added. |
|
||||
| DOCS-AOC-19-002 | TODO | Docs Guild, Architecture Guild | DOCS-AOC-19-001 | Update `/docs/architecture/overview.md` to include AOC boundary, raw stores, and sequence diagram (fetch → guard → raw insert → policy evaluation). | Overview doc updated with diagrams/text; lint passes; stakeholders sign off. |
|
||||
| DOCS-AOC-19-003 | TODO | Docs Guild, Policy Guild | POLICY-AOC-19-003 | Refresh `/docs/architecture/policy-engine.md` clarifying ingestion boundary, raw inputs, and policy-only derived data. | Doc highlights raw-only ingestion contract, updated diagrams merge, compliance checklist added. |
|
||||
| DOCS-AOC-19-004 | TODO | Docs Guild, UI Guild | UI-AOC-19-001 | Extend `/docs/ui/console.md` with Sources dashboard tiles, violation drill-down workflow, and verification action. | UI doc updated with screenshots/flow descriptions, compliance checklist appended. |
|
||||
| DOCS-AOC-19-005 | TODO | Docs Guild, CLI Guild | CLI-AOC-19-003 | Update `/docs/cli/cli-reference.md` with `stella sources ingest --dry-run` and `stella aoc verify` usage, exit codes, and offline notes. | CLI reference + quickstart sections updated; examples validated; compliance checklist added. |
|
||||
| DOCS-AOC-19-006 | TODO | Docs Guild, Observability Guild | CONCELIER-WEB-AOC-19-002, EXCITITOR-WEB-AOC-19-002 | Document new metrics/traces/log keys in `/docs/observability/observability.md`. | Observability doc lists new metrics/traces/log fields; dashboards referenced; compliance checklist appended. |
|
||||
| DOCS-AOC-19-007 | TODO | Docs Guild, Authority Core | AUTH-AOC-19-001 | Update `/docs/security/authority-scopes.md` with new ingestion scopes and tenancy enforcement notes. | Doc reflects new scopes, sample policies updated, compliance checklist added. |
|
||||
| DOCS-AOC-19-008 | TODO | Docs Guild, DevOps Guild | DEVOPS-AOC-19-002 | Refresh `/docs/deploy/containers.md` to cover validator enablement, guard env flags, and read-only verify user. | Deploy doc updated; offline kit section mentions validator scripts; compliance checklist appended. |
|
||||
| DOCS-AOC-19-001 | DONE (2025-10-26) | Docs Guild, Concelier Guild | CONCELIER-WEB-AOC-19-001, EXCITITOR-WEB-AOC-19-001 | Author `/docs/ingestion/aggregation-only-contract.md` covering philosophy, invariants, schemas, error codes, migration, observability, and security checklist. | New doc published with compliance checklist; cross-links from existing docs added. |
|
||||
| DOCS-AOC-19-002 | DONE (2025-10-26) | Docs Guild, Architecture Guild | DOCS-AOC-19-001 | Update `/docs/architecture/overview.md` to include AOC boundary, raw stores, and sequence diagram (fetch → guard → raw insert → policy evaluation). | Overview doc updated with diagrams/text; lint passes; stakeholders sign off. |
|
||||
| DOCS-AOC-19-003 | DONE (2025-10-26) | Docs Guild, Policy Guild | POLICY-AOC-19-003 | Refresh `/docs/architecture/policy-engine.md` clarifying ingestion boundary, raw inputs, and policy-only derived data. | Doc highlights raw-only ingestion contract, updated diagrams merge, compliance checklist added. |
|
||||
| DOCS-AOC-19-004 | DONE (2025-10-26) | Docs Guild, UI Guild | UI-AOC-19-001 | Extend `/docs/ui/console.md` with Sources dashboard tiles, violation drill-down workflow, and verification action. | UI doc updated with screenshots/flow descriptions, compliance checklist appended. |
|
||||
> DOCS-AOC-19-004: Architecture overview & policy-engine updates landed 2025-10-26; incorporate the new AOC boundary diagrams and metrics references.
|
||||
| DOCS-AOC-19-005 | DONE (2025-10-26) | Docs Guild, CLI Guild | CLI-AOC-19-003 | Update `/docs/cli/cli-reference.md` with `stella sources ingest --dry-run` and `stella aoc verify` usage, exit codes, and offline notes. | CLI reference + quickstart sections updated; examples validated; compliance checklist added. |
|
||||
> DOCS-AOC-19-005: New ingestion reference + architecture overview published 2025-10-26; ensure CLI docs link to both and surface AOC exit codes mapping.
|
||||
| DOCS-AOC-19-006 | DONE (2025-10-26) | Docs Guild, Observability Guild | CONCELIER-WEB-AOC-19-002, EXCITITOR-WEB-AOC-19-002 | Document new metrics/traces/log keys in `/docs/observability/observability.md`. | Observability doc lists new metrics/traces/log fields; dashboards referenced; compliance checklist appended. |
|
||||
| DOCS-AOC-19-007 | DONE (2025-10-26) | Docs Guild, Authority Core | AUTH-AOC-19-001 | Update `/docs/security/authority-scopes.md` with new ingestion scopes and tenancy enforcement notes. | Doc reflects new scopes, sample policies updated, compliance checklist added. |
|
||||
| DOCS-AOC-19-008 | DONE (2025-10-26) | Docs Guild, DevOps Guild | DEVOPS-AOC-19-002 | Refresh `/docs/deploy/containers.md` to cover validator enablement, guard env flags, and read-only verify user. | Deploy doc updated; offline kit section mentions validator scripts; compliance checklist appended. |
|
||||
| DOCS-AOC-19-009 | DONE (2025-10-26) | Docs Guild, Authority Core | AUTH-AOC-19-001 | Update AOC docs/samples to reflect new `advisory:*`, `vex:*`, and `aoc:verify` scopes. | Docs reference new scopes, samples aligned, compliance checklist updated. |
|
||||
|
||||
## Air-Gapped Mode (Epic 16)
|
||||
| ID | Status | Owner(s) | Depends on | Description | Exit Criteria |
|
||||
@@ -102,32 +106,23 @@
|
||||
|
||||
| ID | Status | Owner(s) | Depends on | Description | Exit Criteria |
|
||||
|----|--------|----------|------------|-------------|---------------|
|
||||
| DOCS-POLICY-20-001 | TODO | Docs Guild, Policy Guild | POLICY-ENGINE-20-000 | Author `/docs/policy/overview.md` covering concepts, inputs/outputs, determinism, and compliance checklist. | Doc published with diagrams + glossary; lint passes; checklist included. |
|
||||
| DOCS-POLICY-20-002 | TODO | Docs Guild, Policy Guild | POLICY-ENGINE-20-001 | Write `/docs/policy/dsl.md` with grammar, built-ins, examples, anti-patterns. | DSL doc includes grammar tables, examples, compliance checklist; validated against parser tests. |
|
||||
| DOCS-POLICY-20-003 | TODO | Docs Guild, Authority Core | AUTH-POLICY-20-001 | Publish `/docs/policy/lifecycle.md` describing draft→approve workflow, roles, audit, compliance list. | Lifecycle doc linked from UI/CLI help; approvals roles documented; checklist appended. |
|
||||
| DOCS-POLICY-20-004 | TODO | Docs Guild, Scheduler Guild | SCHED-MODELS-20-001 | Create `/docs/policy/runs.md` detailing run modes, incremental mechanics, cursors, replay. | Run doc includes sequence diagrams + compliance checklist; cross-links to scheduler docs. |
|
||||
| DOCS-POLICY-20-005 | TODO | Docs Guild, BE-Base Platform Guild | WEB-POLICY-20-001 | Draft `/docs/api/policy.md` describing endpoints, schemas, error codes. | API doc validated against OpenAPI; examples included; checklist appended. |
|
||||
| DOCS-POLICY-20-006 | TODO | Docs Guild, DevEx/CLI Guild | CLI-POLICY-20-002 | Produce `/docs/cli/policy.md` with command usage, exit codes, JSON output contracts. | CLI doc includes examples, exit codes, compliance checklist. |
|
||||
| DOCS-POLICY-20-007 | TODO | Docs Guild, UI Guild | UI-POLICY-20-001 | Document `/docs/ui/policy-editor.md` covering editor, simulation, diff workflows, approvals. | UI doc includes screenshots/placeholders, accessibility notes, compliance checklist. |
|
||||
| DOCS-POLICY-20-008 | TODO | Docs Guild, Architecture Guild | POLICY-ENGINE-20-003 | Write `/docs/architecture/policy-engine.md` (new epic content) with sequence diagrams, selection strategy, schema. | Architecture doc merged with diagrams; compliance checklist appended; references updated. |
|
||||
| DOCS-POLICY-20-009 | TODO | Docs Guild, Observability Guild | POLICY-ENGINE-20-007 | Add `/docs/observability/policy.md` for metrics/traces/logs, sample dashboards. | Observability doc includes metrics tables, dashboard screenshots, checklist. |
|
||||
| DOCS-POLICY-20-010 | TODO | Docs Guild, Security Guild | AUTH-POLICY-20-002 | Publish `/docs/security/policy-governance.md` covering scopes, approvals, tenancy, least privilege. | Security doc merged; compliance checklist appended; reviewed by Security Guild. |
|
||||
| DOCS-POLICY-20-011 | TODO | Docs Guild, Policy Guild | POLICY-ENGINE-20-001 | Populate `/docs/examples/policies/` with baseline/serverless/internal-only samples and commentary. | Example policies committed with explanations; lint passes; compliance checklist per file. |
|
||||
| DOCS-POLICY-20-012 | TODO | Docs Guild, Support Guild | WEB-POLICY-20-003 | Draft `/docs/faq/policy-faq.md` addressing common pitfalls, VEX conflicts, determinism issues. | FAQ published with Q/A entries, cross-links, compliance checklist. |
|
||||
| DOCS-POLICY-20-001 | DONE (2025-10-26) | Docs Guild, Policy Guild | POLICY-ENGINE-20-000 | Author `/docs/policy/overview.md` covering concepts, inputs/outputs, determinism, and compliance checklist. | Doc published with diagrams + glossary; lint passes; checklist included. |
|
||||
| DOCS-POLICY-20-002 | DONE (2025-10-26) | Docs Guild, Policy Guild | POLICY-ENGINE-20-001 | Write `/docs/policy/dsl.md` with grammar, built-ins, examples, anti-patterns. | DSL doc includes grammar tables, examples, compliance checklist; validated against parser tests. |
|
||||
| DOCS-POLICY-20-003 | DONE (2025-10-26) | Docs Guild, Authority Core | AUTH-POLICY-20-001 | Publish `/docs/policy/lifecycle.md` describing draft→approve workflow, roles, audit, compliance list. | Lifecycle doc linked from UI/CLI help; approvals roles documented; checklist appended. |
|
||||
| DOCS-POLICY-20-004 | DONE (2025-10-26) | Docs Guild, Scheduler Guild | SCHED-MODELS-20-001 | Create `/docs/policy/runs.md` detailing run modes, incremental mechanics, cursors, replay. | Run doc includes sequence diagrams + compliance checklist; cross-links to scheduler docs. |
|
||||
| DOCS-POLICY-20-005 | DONE (2025-10-26) | Docs Guild, BE-Base Platform Guild | WEB-POLICY-20-001 | Draft `/docs/api/policy.md` describing endpoints, schemas, error codes. | API doc validated against OpenAPI; examples included; checklist appended. |
|
||||
| DOCS-POLICY-20-006 | DONE (2025-10-26) | Docs Guild, DevEx/CLI Guild | CLI-POLICY-20-002 | Produce `/docs/cli/policy.md` with command usage, exit codes, JSON output contracts. | CLI doc includes examples, exit codes, compliance checklist. |
|
||||
| DOCS-POLICY-20-007 | DONE (2025-10-26) | Docs Guild, UI Guild | UI-POLICY-20-001 | Document `/docs/ui/policy-editor.md` covering editor, simulation, diff workflows, approvals. | UI doc includes screenshots/placeholders, accessibility notes, compliance checklist. |
|
||||
| DOCS-POLICY-20-008 | DONE (2025-10-26) | Docs Guild, Architecture Guild | POLICY-ENGINE-20-003 | Write `/docs/architecture/policy-engine.md` (new epic content) with sequence diagrams, selection strategy, schema. | Architecture doc merged with diagrams; compliance checklist appended; references updated. |
|
||||
| DOCS-POLICY-20-009 | DONE (2025-10-26) | Docs Guild, Observability Guild | POLICY-ENGINE-20-007 | Add `/docs/observability/policy.md` for metrics/traces/logs, sample dashboards. | Observability doc includes metrics tables, dashboard screenshots, checklist. |
|
||||
| DOCS-POLICY-20-010 | DONE (2025-10-26) | Docs Guild, Security Guild | AUTH-POLICY-20-002 | Publish `/docs/security/policy-governance.md` covering scopes, approvals, tenancy, least privilege. | Security doc merged; compliance checklist appended; reviewed by Security Guild. |
|
||||
| DOCS-POLICY-20-011 | DONE (2025-10-26) | Docs Guild, Policy Guild | POLICY-ENGINE-20-001 | Populate `/docs/examples/policies/` with baseline/serverless/internal-only samples and commentary. | Example policies committed with explanations; lint passes; compliance checklist per file. |
|
||||
| DOCS-POLICY-20-012 | DONE (2025-10-26) | Docs Guild, Support Guild | WEB-POLICY-20-003 | Draft `/docs/faq/policy-faq.md` addressing common pitfalls, VEX conflicts, determinism issues. | FAQ published with Q/A entries, cross-links, compliance checklist. |
|
||||
|
||||
## Graph Explorer v1
|
||||
|
||||
| ID | Status | Owner(s) | Depends on | Description | Exit Criteria |
|
||||
|----|--------|----------|------------|-------------|---------------|
|
||||
| DOCS-GRAPH-21-001 | TODO | Docs Guild, Cartographer Guild | CARTO-GRAPH-21-001..006 | Author `/docs/graph/overview.md` covering concepts, snapshot lifecycle, overlays, and compliance checklist. | Doc merged with diagrams; lint passes; checklist appended. |
|
||||
| DOCS-GRAPH-21-002 | TODO | Docs Guild | CARTO-GRAPH-21-001 | Write `/docs/graph/schema.md` describing node/edge/overlay schemas, indexes, sharding strategy, and sample docs. | Schema doc validated against fixtures; compliance checklist included. |
|
||||
| DOCS-GRAPH-21-003 | TODO | Docs Guild, BE-Base Platform Guild | WEB-GRAPH-21-001..004 | Produce `/docs/graph/api.md` with endpoint specs, parameters, pagination, errors, and curl examples. | API doc aligns with OpenAPI; examples verified; checklist appended. |
|
||||
| DOCS-GRAPH-21-004 | TODO | Docs Guild, UI Guild | UI-GRAPH-21-001..006 | Document `/docs/ui/graph-explorer.md` (screens, filters, paths, diff, accessibility). | UI doc published with screenshots/placeholders; accessibility checklist satisfied. |
|
||||
| DOCS-GRAPH-21-005 | TODO | Docs Guild, DevEx/CLI Guild | CLI-GRAPH-21-001..003 | Create `/docs/cli/graph.md` detailing CLI commands, exit codes, JSON schemas. | CLI doc merged; examples validated; checklist appended. |
|
||||
| DOCS-GRAPH-21-006 | TODO | Docs Guild, Architecture Guild | CARTO-GRAPH-21-002..007 | Draft `/docs/architecture/cartographer.md` covering build pipeline, layout tiling, overlay patching, sequence diagrams. | Architecture doc merged with diagrams; compliance checklist included. |
|
||||
| DOCS-GRAPH-21-007 | TODO | Docs Guild, Observability Guild | CARTO-GRAPH-21-008, DEVOPS-GRAPH-21-001 | Publish `/docs/observability/graph.md` (metrics/traces/logs, dashboards, alerts). | Observability doc live; dashboards linked; checklist appended. |
|
||||
| DOCS-GRAPH-21-008 | TODO | Docs Guild, Security Guild | AUTH-GRAPH-21-001..003 | Write `/docs/security/graph-access.md` describing RBAC, tenancy, scopes, service identities. | Security doc merged; reviewer checklist completed. |
|
||||
| DOCS-GRAPH-21-009 | TODO | Docs Guild, Cartographer Guild | CARTO-GRAPH-21-006 | Document `/docs/examples/graph/` sample SBOMs, screenshots, exports with reviewer checklist. | Example docs + assets committed; lint passes; checklist appended. |
|
||||
|
||||
## Link-Not-Merge v1
|
||||
|
||||
@@ -143,17 +138,17 @@
|
||||
|
||||
| ID | Status | Owner(s) | Depends on | Description | Exit Criteria |
|
||||
|----|--------|----------|------------|-------------|---------------|
|
||||
| DOCS-CONSOLE-23-001 | TODO | Docs Guild, Console Guild | CONSOLE-CORE-23-004 | Publish `/docs/ui/console-overview.md` covering IA, tenant model, global filters, and AOC alignment with compliance checklist. | Doc merged with diagrams + overview tables; checklist appended; Console Guild sign-off. |
|
||||
| DOCS-CONSOLE-23-002 | TODO | Docs Guild, Console Guild | DOCS-CONSOLE-23-001 | Author `/docs/ui/navigation.md` detailing routes, breadcrumbs, keyboard shortcuts, deep links, and tenant context switching. | Navigation doc merged with shortcut tables and screenshots; accessibility checklist satisfied. |
|
||||
| DOCS-CONSOLE-23-003 | TODO | Docs Guild, SBOM Service Guild, Console Guild | SBOM-CONSOLE-23-001, CONSOLE-FEAT-23-102 | Document `/docs/ui/sbom-explorer.md` (catalog, detail, graph overlays, exports) including compliance checklist and performance tips. | Doc merged with annotated screenshots, export instructions, and overlay examples; checklist appended. |
|
||||
| DOCS-CONSOLE-23-004 | TODO | Docs Guild, Concelier Guild, Excititor Guild | CONCELIER-CONSOLE-23-001, EXCITITOR-CONSOLE-23-001 | Produce `/docs/ui/advisories-and-vex.md` explaining aggregation-not-merge, conflict indicators, raw viewers, and provenance banners. | Doc merged; raw JSON examples included; compliance checklist complete. |
|
||||
| DOCS-CONSOLE-23-005 | TODO | Docs Guild, Policy Guild | POLICY-CONSOLE-23-001, CONSOLE-FEAT-23-104 | Write `/docs/ui/findings.md` describing filters, saved views, explain drawer, exports, and CLI parity callouts. | Doc merged with filter matrix + explain walkthrough; checklist appended. |
|
||||
| DOCS-CONSOLE-23-006 | TODO | Docs Guild, Policy Guild, Product Ops | POLICY-CONSOLE-23-002, CONSOLE-FEAT-23-105 | Publish `/docs/ui/policies.md` with editor, simulation, approvals, compliance checklist, and RBAC mapping. | Doc merged; Monaco screenshots + simulation diff examples included; approval flow described; checklist appended. |
|
||||
| DOCS-CONSOLE-23-007 | TODO | Docs Guild, Scheduler Guild | SCHED-CONSOLE-23-001, CONSOLE-FEAT-23-106 | Document `/docs/ui/runs.md` covering queues, live progress, diffs, retries, evidence downloads, and troubleshooting. | Doc merged with SSE troubleshooting, metrics references, compliance checklist. |
|
||||
| DOCS-CONSOLE-23-008 | TODO | Docs Guild, Authority Guild | AUTH-CONSOLE-23-002, CONSOLE-FEAT-23-108 | Draft `/docs/ui/admin.md` describing users/roles, tenants, tokens, integrations, fresh-auth prompts, and RBAC mapping. | Doc merged with tables for scopes vs roles, screenshots, compliance checklist. |
|
||||
| DOCS-CONSOLE-23-009 | TODO | Docs Guild, DevOps Guild | DOWNLOADS-CONSOLE-23-001, CONSOLE-FEAT-23-109 | Publish `/docs/ui/downloads.md` listing product images, commands, offline instructions, parity with CLI, and compliance checklist. | Doc merged; manifest sample included; copy-to-clipboard guidance documented; checklist complete. |
|
||||
| DOCS-CONSOLE-23-010 | TODO | Docs Guild, Deployment Guild, Console Guild | DEVOPS-CONSOLE-23-002, CONSOLE-REL-23-301 | Write `/docs/deploy/console.md` (Helm, ingress, TLS, CSP, env vars, health checks) with compliance checklist. | Deploy doc merged; templates validated; CSP guidance included; checklist appended. |
|
||||
| DOCS-CONSOLE-23-011 | TODO | Docs Guild, Deployment Guild | DOCS-CONSOLE-23-010 | Update `/docs/install/docker.md` to cover Console image, Compose/Helm usage, offline tarballs, parity with CLI. | Doc updated with new sections; commands validated; compliance checklist appended. |
|
||||
| DOCS-CONSOLE-23-001 | DONE (2025-10-26) | Docs Guild, Console Guild | CONSOLE-CORE-23-004 | Publish `/docs/ui/console-overview.md` covering IA, tenant model, global filters, and AOC alignment with compliance checklist. | Doc merged with diagrams + overview tables; checklist appended; Console Guild sign-off. |
|
||||
| DOCS-CONSOLE-23-002 | DONE (2025-10-26) | Docs Guild, Console Guild | DOCS-CONSOLE-23-001 | Author `/docs/ui/navigation.md` detailing routes, breadcrumbs, keyboard shortcuts, deep links, and tenant context switching. | Navigation doc merged with shortcut tables and screenshots; accessibility checklist satisfied. |
|
||||
| DOCS-CONSOLE-23-003 | DONE (2025-10-26) | Docs Guild, SBOM Service Guild, Console Guild | SBOM-CONSOLE-23-001, CONSOLE-FEAT-23-102 | Document `/docs/ui/sbom-explorer.md` (catalog, detail, graph overlays, exports) including compliance checklist and performance tips. | Doc merged with annotated screenshots, export instructions, and overlay examples; checklist appended. |
|
||||
| DOCS-CONSOLE-23-004 | DONE (2025-10-26) | Docs Guild, Concelier Guild, Excititor Guild | CONCELIER-CONSOLE-23-001, EXCITITOR-CONSOLE-23-001 | Produce `/docs/ui/advisories-and-vex.md` explaining aggregation-not-merge, conflict indicators, raw viewers, and provenance banners. | Doc merged; raw JSON examples included; compliance checklist complete. |
|
||||
| DOCS-CONSOLE-23-005 | DONE (2025-10-26) | Docs Guild, Policy Guild | POLICY-CONSOLE-23-001, CONSOLE-FEAT-23-104 | Write `/docs/ui/findings.md` describing filters, saved views, explain drawer, exports, and CLI parity callouts. | Doc merged with filter matrix + explain walkthrough; checklist appended. |
|
||||
| DOCS-CONSOLE-23-006 | DONE (2025-10-26) | Docs Guild, Policy Guild, Product Ops | POLICY-CONSOLE-23-002, CONSOLE-FEAT-23-105 | Publish `/docs/ui/policies.md` with editor, simulation, approvals, compliance checklist, and RBAC mapping. | Doc merged; Monaco screenshots + simulation diff examples included; approval flow described; checklist appended. |
|
||||
| DOCS-CONSOLE-23-007 | DONE (2025-10-26) | Docs Guild, Scheduler Guild | SCHED-CONSOLE-23-001, CONSOLE-FEAT-23-106 | Document `/docs/ui/runs.md` covering queues, live progress, diffs, retries, evidence downloads, and troubleshooting. | Doc merged with SSE troubleshooting, metrics references, compliance checklist. |
|
||||
| DOCS-CONSOLE-23-008 | DONE (2025-10-26) | Docs Guild, Authority Guild | AUTH-CONSOLE-23-002, CONSOLE-FEAT-23-108 | Draft `/docs/ui/admin.md` describing users/roles, tenants, tokens, integrations, fresh-auth prompts, and RBAC mapping. | Doc merged with tables for scopes vs roles, screenshots, compliance checklist. |
|
||||
| DOCS-CONSOLE-23-009 | DONE (2025-10-27) | Docs Guild, DevOps Guild | DOWNLOADS-CONSOLE-23-001, CONSOLE-FEAT-23-109 | Publish `/docs/ui/downloads.md` listing product images, commands, offline instructions, parity with CLI, and compliance checklist. | Doc merged; manifest sample included; copy-to-clipboard guidance documented; checklist complete. |
|
||||
| DOCS-CONSOLE-23-010 | DONE (2025-10-27) | Docs Guild, Deployment Guild, Console Guild | DEVOPS-CONSOLE-23-002, CONSOLE-REL-23-301 | Write `/docs/deploy/console.md` (Helm, ingress, TLS, CSP, env vars, health checks) with compliance checklist. | Deploy doc merged; templates validated; CSP guidance included; checklist appended. |
|
||||
| DOCS-CONSOLE-23-011 | DOING (2025-10-27) | Docs Guild, Deployment Guild | DOCS-CONSOLE-23-010 | Update `/docs/install/docker.md` to cover Console image, Compose/Helm usage, offline tarballs, parity with CLI. | Doc updated with new sections; commands validated; compliance checklist appended. |
|
||||
| DOCS-CONSOLE-23-012 | TODO | Docs Guild, Security Guild | AUTH-CONSOLE-23-003, WEB-CONSOLE-23-002 | Publish `/docs/security/console-security.md` detailing OIDC flows, scopes, CSP, fresh-auth, evidence handling, and compliance checklist. | Security doc merged; threat model notes included; checklist appended. |
|
||||
| DOCS-CONSOLE-23-013 | TODO | Docs Guild, Observability Guild | TELEMETRY-CONSOLE-23-001, CONSOLE-QA-23-403 | Write `/docs/observability/ui-telemetry.md` cataloguing metrics/logs/traces, dashboards, alerts, and feature flags. | Doc merged with instrumentation tables, dashboard screenshots, checklist appended. |
|
||||
| DOCS-CONSOLE-23-014 | TODO | Docs Guild, Console Guild, CLI Guild | CONSOLE-DOC-23-502 | Maintain `/docs/cli-vs-ui-parity.md` matrix and integrate CI check guidance. | Matrix published with parity status, CI workflow documented, compliance checklist appended. |
|
||||
|
||||
@@ -9,3 +9,5 @@ The Aggregation-Only Contract keeps ingestion services deterministic and policy-
|
||||
5. **Guardrails everywhere.** Roslyn analyzers, schema validators, and CI smoke tests should fail builds that attempt forbidden writes.
|
||||
|
||||
For detailed roles and ownership boundaries, see `AGENTS.md` at the repo root and the module-specific `ARCHITECTURE_*.md` dossiers.
|
||||
|
||||
Need the full contract? Read the [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md) for schemas, error codes, and migration guidance.
|
||||
|
||||
402
docs/api/policy.md
Normal file
402
docs/api/policy.md
Normal file
@@ -0,0 +1,402 @@
|
||||
# Policy Engine REST API
|
||||
|
||||
> **Audience:** Backend integrators, platform operators, and CI engineers invoking Policy Engine services programmatically.
|
||||
> **Base URL:** `/api/policy/*` (internal gateway route) – requires OAuth 2.0 bearer token issued by Authority with scopes listed below.
|
||||
|
||||
This document is the canonical reference for the Policy Engine REST surface described in Epic 2 (Policy Engine v2). Use it alongside the [DSL](../policy/dsl.md), [Lifecycle](../policy/lifecycle.md), and [Runs](../policy/runs.md) guides for end-to-end implementations.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Authentication & Headers
|
||||
|
||||
- **Auth:** Bearer tokens (`Authorization: Bearer <token>`) with the following scopes as applicable:
|
||||
- `policy:read`, `policy:write`, `policy:submit`, `policy:approve`, `policy:run`, `policy:activate`, `policy:archive`, `policy:simulate`, `policy:runs`
|
||||
- `findings:read` (for effective findings APIs)
|
||||
- `effective:write` (service identity only; not exposed to clients)
|
||||
- **Service identity:** Authority marks the Policy Engine client with `properties.serviceIdentity: policy-engine`. Tokens missing this marker cannot obtain `effective:write`.
|
||||
- **Tenant:** Supply tenant context via `X-Stella-Tenant`. Tokens without multi-tenant claims default to `default`.
|
||||
- **Idempotency:** For mutating endpoints, include `Idempotency-Key` (UUID). Retries with same key return original result.
|
||||
- **Content type:** All request/response bodies are `application/json; charset=utf-8` unless otherwise noted.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Error Model
|
||||
|
||||
All errors use HTTP semantics plus a structured payload:
|
||||
|
||||
```json
|
||||
{
|
||||
"code": "ERR_POL_001",
|
||||
"message": "Policy syntax error",
|
||||
"details": [
|
||||
{"path": "rules[0].when", "error": "Unknown function foo()"}
|
||||
],
|
||||
"traceId": "01HDV1C4E9Z4T5G6H7J8",
|
||||
"timestamp": "2025-10-26T14:07:03Z"
|
||||
}
|
||||
```
|
||||
|
||||
| Code | Meaning | Notes |
|
||||
|------|---------|-------|
|
||||
| `ERR_POL_001` | Policy syntax/compile error | Returned by `compile`, `submit`, `simulate`, `run` when DSL invalid. |
|
||||
| `ERR_POL_002` | Policy not approved | Attempted to run or activate unapproved version. |
|
||||
| `ERR_POL_003` | Missing inputs | Downstream service unavailable (Concelier/Excititor/SBOM). |
|
||||
| `ERR_POL_004` | Determinism violation | Illegal API usage (wall-clock, RNG). Triggers incident mode. |
|
||||
| `ERR_POL_005` | Unauthorized materialisation | Identity lacks `effective:write`. |
|
||||
| `ERR_POL_006` | Run canceled or timed out | Includes cancellation metadata. |
|
||||
|
||||
---
|
||||
|
||||
## 3 · Policy Management
|
||||
|
||||
### 3.1 Create Draft
|
||||
|
||||
```
|
||||
POST /api/policy/policies
|
||||
Scopes: policy:write
|
||||
```
|
||||
|
||||
**Request**
|
||||
|
||||
```json
|
||||
{
|
||||
"policyId": "P-7",
|
||||
"name": "Default Org Policy",
|
||||
"description": "Baseline severity + VEX precedence",
|
||||
"dsl": {
|
||||
"syntax": "stella-dsl@1",
|
||||
"source": "policy \"Default Org Policy\" syntax \"stella-dsl@1\" { ... }"
|
||||
},
|
||||
"tags": ["baseline","vex"]
|
||||
}
|
||||
```
|
||||
|
||||
**Response 201**
|
||||
|
||||
```json
|
||||
{
|
||||
"policyId": "P-7",
|
||||
"version": 1,
|
||||
"status": "draft",
|
||||
"digest": "sha256:7e1d…",
|
||||
"createdBy": "user:ali",
|
||||
"createdAt": "2025-10-26T13:40:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 List Policies
|
||||
|
||||
```
|
||||
GET /api/policy/policies?status=approved&tenant=default&page=1&pageSize=25
|
||||
Scopes: policy:read
|
||||
```
|
||||
|
||||
Returns paginated list with `X-Total-Count` header.
|
||||
|
||||
### 3.3 Fetch Version
|
||||
|
||||
```
|
||||
GET /api/policy/policies/{policyId}/versions/{version}
|
||||
Scopes: policy:read
|
||||
```
|
||||
|
||||
Returns full DSL, metadata, provenance, simulation artefact references.
|
||||
|
||||
### 3.4 Update Draft Version
|
||||
|
||||
```
|
||||
PUT /api/policy/policies/{policyId}/versions/{version}
|
||||
Scopes: policy:write
|
||||
```
|
||||
|
||||
Body identical to create. Only permitted while `status=draft`.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Lifecycle Transitions
|
||||
|
||||
### 4.1 Submit for Review
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/versions/{version}:submit
|
||||
Scopes: policy:submit
|
||||
```
|
||||
|
||||
**Request**
|
||||
|
||||
```json
|
||||
{
|
||||
"reviewers": ["user:kay","group:sec-reviewers"],
|
||||
"notes": "Simulated on golden SBOM set (diff attached)",
|
||||
"simulationArtifacts": [
|
||||
"blob://policy/P-7/v3/simulations/2025-10-26.json"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response 202** – submission recorded. `Location` header points to review resource.
|
||||
|
||||
### 4.2 Review Feedback
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/versions/{version}/reviews
|
||||
Scopes: policy:review
|
||||
```
|
||||
|
||||
**Request**
|
||||
|
||||
```json
|
||||
{
|
||||
"decision": "approve", // approve | request_changes | comment
|
||||
"note": "Looks good, ensure incident playbook covers reachability data.",
|
||||
"blocking": false
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Approve
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/versions/{version}:approve
|
||||
Scopes: policy:approve
|
||||
```
|
||||
|
||||
Body requires approval note and confirmation of compliance gates:
|
||||
|
||||
```json
|
||||
{
|
||||
"note": "All simulations and determinism checks passed.",
|
||||
"acknowledgeDeterminism": true,
|
||||
"acknowledgeSimulation": true
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 Activate
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/versions/{version}:activate
|
||||
Scopes: policy:activate, policy:run
|
||||
```
|
||||
|
||||
Marks version as active for tenant; triggers optional immediate full run (`"runNow": true`).
|
||||
|
||||
### 4.5 Archive
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/versions/{version}:archive
|
||||
Scopes: policy:archive
|
||||
```
|
||||
|
||||
Request includes `reason` and optional `incidentId`.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Compilation & Validation
|
||||
|
||||
### 5.1 Compile
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/versions/{version}:compile
|
||||
Scopes: policy:write
|
||||
```
|
||||
|
||||
**Response 200**
|
||||
|
||||
```json
|
||||
{
|
||||
"digest": "sha256:7e1d…",
|
||||
"warnings": [],
|
||||
"rules": {
|
||||
"count": 24,
|
||||
"actions": {
|
||||
"block": 5,
|
||||
"warn": 4,
|
||||
"ignore": 3,
|
||||
"requireVex": 2
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Lint / Simulate Quick Check
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/lint
|
||||
Scopes: policy:write
|
||||
```
|
||||
|
||||
Slim wrapper used by CLI; returns 204 on success or `ERR_POL_001` payload.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Run & Simulation APIs
|
||||
|
||||
> Schema reference: canonical policy run request/status/diff payloads ship with the Scheduler Models guide (`src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-20-001-POLICY-RUNS.md`) and JSON fixtures under `samples/api/scheduler/policy-*.json`.
|
||||
|
||||
### 6.1 Trigger Run
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/runs
|
||||
Scopes: policy:run
|
||||
```
|
||||
|
||||
**Request**
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "incremental", // full | incremental
|
||||
"runId": "run:P-7:2025-10-26:auto", // optional idempotency key
|
||||
"sbomSet": ["sbom:S-42","sbom:S-318"],
|
||||
"env": {"exposure": "internet"},
|
||||
"priority": "normal" // normal | high | emergency
|
||||
}
|
||||
```
|
||||
|
||||
**Response 202**
|
||||
|
||||
```json
|
||||
{
|
||||
"runId": "run:P-7:2025-10-26:auto",
|
||||
"status": "queued",
|
||||
"queuedAt": "2025-10-26T14:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Get Run Status
|
||||
|
||||
```
|
||||
GET /api/policy/policies/{policyId}/runs/{runId}
|
||||
Scopes: policy:runs
|
||||
```
|
||||
|
||||
Includes status, stats, determinism hash, failure diagnostics.
|
||||
|
||||
### 6.3 List Runs
|
||||
|
||||
```
|
||||
GET /api/policy/policies/{policyId}/runs?mode=incremental&status=failed&page=1&pageSize=20
|
||||
Scopes: policy:runs
|
||||
```
|
||||
|
||||
Supports filtering by `mode`, `status`, `from`/`to` timestamps, `tenant`.
|
||||
|
||||
### 6.4 Simulate
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/simulate
|
||||
Scopes: policy:simulate
|
||||
```
|
||||
|
||||
**Request**
|
||||
|
||||
```json
|
||||
{
|
||||
"baseVersion": 3,
|
||||
"candidateVersion": 4,
|
||||
"sbomSet": ["sbom:S-42","sbom:S-318"],
|
||||
"env": {"sealed": false},
|
||||
"explain": true
|
||||
}
|
||||
```
|
||||
|
||||
**Response 200**
|
||||
|
||||
```json
|
||||
{
|
||||
"diff": {
|
||||
"added": 12,
|
||||
"removed": 8,
|
||||
"unchanged": 657,
|
||||
"bySeverity": {
|
||||
"Critical": {"up": 1, "down": 0},
|
||||
"High": {"up": 3, "down": 4}
|
||||
}
|
||||
},
|
||||
"explainUri": "blob://policy/P-7/simulations/2025-10-26-4-vs-3.json"
|
||||
}
|
||||
```
|
||||
|
||||
### 6.5 Replay Run
|
||||
|
||||
```
|
||||
POST /api/policy/policies/{policyId}/runs/{runId}:replay
|
||||
Scopes: policy:runs, policy:simulate
|
||||
```
|
||||
|
||||
Produces sealed bundle for determinism verification; returns location of bundle.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Effective Findings APIs
|
||||
|
||||
### 7.1 List Findings
|
||||
|
||||
```
|
||||
GET /api/policy/findings/{policyId}?sbomId=S-42&status=affected&severity=High,Critical&page=1&pageSize=100
|
||||
Scopes: findings:read
|
||||
```
|
||||
|
||||
Response includes cursor-based pagination:
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{
|
||||
"findingId": "P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337",
|
||||
"status": "affected",
|
||||
"severity": {"normalized": "High", "score": 7.5},
|
||||
"sbomId": "sbom:S-42",
|
||||
"advisoryIds": ["CVE-2021-23337"],
|
||||
"vex": {"winningStatementId": "VendorX-123"},
|
||||
"policyVersion": 4,
|
||||
"updatedAt": "2025-10-26T14:06:01Z"
|
||||
}
|
||||
],
|
||||
"nextCursor": "eyJwYWdlIjoxfQ=="
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Fetch Explain Trace
|
||||
|
||||
```
|
||||
GET /api/policy/findings/{policyId}/{findingId}/explain?mode=verbose
|
||||
Scopes: findings:read
|
||||
```
|
||||
|
||||
Returns rule hit sequence:
|
||||
|
||||
```json
|
||||
{
|
||||
"findingId": "P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337",
|
||||
"policyVersion": 4,
|
||||
"steps": [
|
||||
{"rule": "vex_precedence", "status": "not_affected", "inputs": {"statementId": "VendorX-123"}},
|
||||
{"rule": "severity_baseline", "severity": {"normalized": "Low", "score": 3.4}}
|
||||
],
|
||||
"sealedHints": [{"message": "Using cached EPSS percentile from bundle 2025-10-20"}]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8 · Events & Webhooks
|
||||
|
||||
- `policy.run.completed` – emitted with `runId`, `policyId`, `mode`, `stats`, `determinismHash`.
|
||||
- `policy.run.failed` – includes error code, retry count, guidance.
|
||||
- `policy.lifecycle.*` – mirrored from lifecycle APIs (see [Lifecycle guide](../policy/lifecycle.md)).
|
||||
- Webhook registration occurs via `/api/policy/webhooks` (future work, reserved). For now, integrate with Notifier streams documented in `/docs/notifications/*`.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Compliance Checklist
|
||||
|
||||
- [ ] **Scopes enforced:** Endpoint access requires correct Authority scope mapping (see `/src/StellaOps.Authority/TASKS.md`).
|
||||
- [ ] **Schemas current:** JSON examples align with Scheduler Models (`SCHED-MODELS-20-001`) and Policy Engine DTOs; update when contracts change.
|
||||
- [ ] **Error codes mapped:** `ERR_POL_*` table reflects implementation and CI tests cover edge cases.
|
||||
- [ ] **Pagination documented:** List endpoints specify page/size and cursor semantics; responses include `X-Total-Count` or `nextCursor`.
|
||||
- [ ] **Idempotency described:** Mutating endpoints mandate `Idempotency-Key`.
|
||||
- [ ] **Offline parity noted:** Simulate/run endpoints explain `--sealed` behaviour and bundle generation.
|
||||
- [ ] **Cross-links added:** References to lifecycle, runs, DSL, and observability docs verified.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
168
docs/architecture/overview.md
Normal file
168
docs/architecture/overview.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# StellaOps Architecture Overview (Sprint 19)
|
||||
|
||||
> **Ownership:** Architecture Guild • Docs Guild
|
||||
> **Audience:** Service owners, platform engineers, solution architects
|
||||
> **Related:** [High-Level Architecture](../07_HIGH_LEVEL_ARCHITECTURE.md), [Concelier Architecture](../ARCHITECTURE_CONCELIER.md), [Policy Engine Architecture](policy-engine.md), [Aggregation-Only Contract](../ingestion/aggregation-only-contract.md)
|
||||
|
||||
This dossier summarises the end-to-end runtime topology after the Aggregation-Only Contract (AOC) rollout. It highlights where raw facts live, how ingest services enforce guardrails, and how downstream components consume those facts to derive policy decisions and user-facing experiences.
|
||||
|
||||
---
|
||||
|
||||
## 1 · System landscape
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Edge["Clients & Automation"]
|
||||
CLI[stella CLI]
|
||||
UI[Console SPA]
|
||||
APIClients[CI / API Clients]
|
||||
end
|
||||
Gateway[API Gateway<br/>(JWT + DPoP scopes)]
|
||||
subgraph Scanner["Fact Collection"]
|
||||
ScannerWeb[Scanner.WebService]
|
||||
ScannerWorkers[Scanner.Workers]
|
||||
Agent[Agent Runtime]
|
||||
end
|
||||
subgraph Ingestion["Aggregation-Only Ingestion (AOC)"]
|
||||
Concelier[Concelier.WebService]
|
||||
Excititor[Excititor.WebService]
|
||||
RawStore[(MongoDB<br/>advisory_raw / vex_raw)]
|
||||
end
|
||||
subgraph Derivation["Policy & Overlay"]
|
||||
Policy[Policy Engine]
|
||||
Scheduler[Scheduler Services]
|
||||
Notify[Notifier]
|
||||
end
|
||||
subgraph Experience["UX & Export"]
|
||||
UIService[Console Backend]
|
||||
Exporters[Export / Offline Kit]
|
||||
end
|
||||
Observability[Telemetry Stack]
|
||||
|
||||
CLI --> Gateway
|
||||
UI --> Gateway
|
||||
APIClients --> Gateway
|
||||
Gateway --> ScannerWeb
|
||||
ScannerWeb --> ScannerWorkers
|
||||
ScannerWorkers --> Concelier
|
||||
ScannerWorkers --> Excititor
|
||||
Concelier --> RawStore
|
||||
Excititor --> RawStore
|
||||
RawStore --> Policy
|
||||
Policy --> Scheduler
|
||||
Policy --> Notify
|
||||
Policy --> UIService
|
||||
Scheduler --> UIService
|
||||
UIService --> Exporters
|
||||
Exporters --> CLI
|
||||
Exporters --> Offline[Offline Kit]
|
||||
Observability -.-> ScannerWeb
|
||||
Observability -.-> Concelier
|
||||
Observability -.-> Excititor
|
||||
Observability -.-> Policy
|
||||
Observability -.-> Scheduler
|
||||
Observability -.-> Notify
|
||||
```
|
||||
|
||||
Key boundaries:
|
||||
|
||||
- **AOC border.** Everything inside the Ingestion subgraph writes only immutable raw facts plus link hints. Derived severity, consensus, and risk remain outside the border.
|
||||
- **Policy-only derivation.** Policy Engine materialises `effective_finding_*` collections and emits overlays; other services consume but never mutate them.
|
||||
- **Tenant enforcement.** Authority-issued DPoP scopes flow through Gateway to every service; raw stores and overlays include `tenant` strictly.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Aggregation-Only Contract focus
|
||||
|
||||
### 2.1 Responsibilities at the boundary
|
||||
|
||||
| Area | Services | Responsibilities under AOC | Forbidden under AOC |
|
||||
|------|----------|-----------------------------|---------------------|
|
||||
| **Ingestion (Concelier / Excititor)** | `StellaOps.Concelier.WebService`, `StellaOps.Excititor.WebService` | Fetch upstream advisories/VEX, verify signatures, compute linksets, append immutable documents to `advisory_raw` / `vex_raw`, emit observability signals, expose raw read APIs. | Computing severity, consensus, suppressions, or policy hints; merging upstream sources into a single derived record; mutating existing documents. |
|
||||
| **Policy & Overlay** | `StellaOps.Policy.Engine`, Scheduler | Join SBOM inventory with raw advisories/VEX, evaluate policies, issue `effective_finding_*` overlays, drive remediation workflows. | Writing to raw collections; bypassing guard scopes; running without recorded provenance. |
|
||||
| **Experience layers** | Console, CLI, Exporters | Surface raw facts + policy overlays; run `stella aoc verify`; render AOC dashboards and reports. | Accepting ingestion payloads that lack provenance or violate guard results. |
|
||||
|
||||
### 2.2 Raw stores
|
||||
|
||||
| Collection | Purpose | Key fields | Notes |
|
||||
|------------|---------|------------|-------|
|
||||
| `advisory_raw` | Immutable vendor/ecosystem advisory documents. | `_id`, `tenant`, `source.*`, `upstream.*`, `content.raw`, `linkset`, `supersedes`. | Idempotent by `(source.vendor, upstream.upstream_id, upstream.content_hash)`. |
|
||||
| `vex_raw` | Immutable vendor VEX statements. | Mirrors `advisory_raw`; `identifiers.statements` summarises affected components. | Maintains supersedes chain identical to advisory flow. |
|
||||
| Change streams (`advisory_raw_stream`, `vex_raw_stream`) | Feed Policy Engine and Scheduler. | `operationType`, `documentKey`, `fullDocument`, `tenant`, `traceId`. | Scope filtered per tenant before delivery. |
|
||||
|
||||
### 2.3 Guarded ingestion sequence
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Upstream as Upstream Source
|
||||
participant Connector as Concelier/Excititor Connector
|
||||
participant Guard as AOCWriteGuard
|
||||
participant Mongo as MongoDB (advisory_raw / vex_raw)
|
||||
participant Stream as Change Stream
|
||||
participant Policy as Policy Engine
|
||||
|
||||
Upstream-->>Connector: CSAF / OSV / VEX document
|
||||
Connector->>Connector: Normalize transport, compute content_hash
|
||||
Connector->>Guard: Candidate raw doc (source + upstream + content + linkset)
|
||||
Guard-->>Connector: ERR_AOC_00x on violation
|
||||
Guard->>Mongo: Append immutable document (with tenant & supersedes)
|
||||
Mongo-->>Stream: Change event (tenant scoped)
|
||||
Stream->>Policy: Raw delta payload
|
||||
Policy->>Policy: Evaluate policies, compute effective findings
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2.4 Authority scopes & tenancy
|
||||
|
||||
| Scope | Holder | Purpose | Notes |
|
||||
|-------|--------|---------|-------|
|
||||
| `advisory:write` / `vex:write` | Concelier / Excititor collectors | Append raw documents through ingestion endpoints. | Paired with tenant claims; requests without tenant are rejected. |
|
||||
| `advisory:verify` / `vex:verify` | DevOps verify identity, CLI | Run `stella aoc verify` or call `/aoc/verify`. | Read-only; cannot mutate raw docs. |
|
||||
| `effective:write` | Policy Engine | Materialise `effective_finding_*` overlays. | Only Policy Engine identity may hold; ingestion contexts receive `ERR_AOC_006` if they attempt. |
|
||||
| `effective:read` | Console, CLI, exports | Consume derived findings. | Enforced by Gateway and downstream services. |
|
||||
|
||||
---
|
||||
|
||||
## 3 · Data & control flow highlights
|
||||
|
||||
1. **Ingestion:** Concelier / Excititor connectors fetch upstream documents, compute linksets, and hand payloads to `AOCWriteGuard`. Guards validate schema, provenance, forbidden fields, supersedes pointers, and append-only rules before writing to Mongo.
|
||||
2. **Verification:** `stella aoc verify` (CLI/CI) and `/aoc/verify` endpoints replay guard checks against stored documents, mapping `ERR_AOC_00x` codes to exit codes for automation.
|
||||
3. **Policy evaluation:** Mongo change streams deliver tenant-scoped raw deltas. Policy Engine joins SBOM inventory (via BOM Index), executes deterministic policies, writes overlays, and emits events to Scheduler/Notify.
|
||||
4. **Experience surfaces:** Console renders an AOC dashboard showing ingestion latency, guard violations, and supersedes depth. CLI exposes raw-document fetch helpers for auditing. Offline Kit bundles raw collections alongside guard configs to keep air-gapped installs verifiable.
|
||||
5. **Observability:** All services emit `ingestion_write_total`, `aoc_violation_total{code}`, `ingestion_latency_seconds`, and trace spans `ingest.fetch`, `ingest.transform`, `ingest.write`, `aoc.guard`. Logs correlate via `traceId`, `tenant`, `source.vendor`, and `content_hash`.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Offline & disaster readiness
|
||||
|
||||
- **Offline Kit:** Packages raw Mongo snapshots (`advisory_raw`, `vex_raw`) plus guard configuration and CLI verifier binaries so air-gapped sites can re-run AOC checks before promotion.
|
||||
- **Recovery:** Supersedes chains allow rollback to prior revisions without mutating documents. Disaster exercises must rehearse restoring from snapshot, replaying change streams into Policy Engine, and re-validating guard compliance.
|
||||
- **Migration:** Legacy normalised fields are moved to temporary views during cutover; ingestion runtime removes writes once guard-enforced path is live (see [Migration playbook](../ingestion/aggregation-only-contract.md#8-migration-playbook)).
|
||||
|
||||
---
|
||||
|
||||
## 5 · References
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Concelier architecture](../ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor architecture](../ARCHITECTURE_EXCITITOR.md)
|
||||
- [Policy Engine architecture](policy-engine.md)
|
||||
- [Authority service](../ARCHITECTURE_AUTHORITY.md)
|
||||
- [Observability standards (upcoming)](../observability/policy.md) – interim reference for telemetry naming.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Compliance checklist
|
||||
|
||||
- [ ] AOC guard enabled for all Concelier and Excititor write paths in production.
|
||||
- [ ] Mongo schema validators deployed for `advisory_raw` and `vex_raw`; change streams scoped per tenant.
|
||||
- [ ] Authority scopes (`advisory:*`, `vex:*`, `effective:*`) configured in Gateway and validated via integration tests.
|
||||
- [ ] `stella aoc verify` wired into CI/CD pipelines with seeded violation fixtures.
|
||||
- [ ] Console AOC dashboard and CLI documentation reference the new ingestion contract.
|
||||
- [ ] Offline Kit bundles include guard configs, verifier tooling, and documentation updates.
|
||||
- [ ] Observability dashboards include violation, latency, and supersedes depth metrics with alert thresholds.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
243
docs/architecture/policy-engine.md
Normal file
243
docs/architecture/policy-engine.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Policy Engine Architecture (v2)
|
||||
|
||||
> **Ownership:** Policy Guild • Platform Guild
|
||||
> **Services:** `StellaOps.Policy.Engine` (Minimal API + worker host)
|
||||
> **Data Stores:** MongoDB (`policies`, `policy_runs`, `effective_finding_*`), Object storage (explain bundles), optional NATS/Mongo queue
|
||||
> **Related docs:** [Policy overview](../policy/overview.md), [DSL](../policy/dsl.md), [Lifecycle](../policy/lifecycle.md), [Runs](../policy/runs.md), [REST API](../api/policy.md), [Policy CLI](../cli/policy.md), [Architecture overview](../architecture/overview.md), [AOC reference](../ingestion/aggregation-only-contract.md)
|
||||
|
||||
This dossier describes the internal structure of the Policy Engine service delivered in Epic 2. It focuses on module boundaries, deterministic evaluation, orchestration, and integration contracts with Concelier, Excititor, SBOM Service, Authority, Scheduler, and Observability stacks.
|
||||
|
||||
The service operates strictly downstream of the **Aggregation-Only Contract (AOC)**. It consumes immutable `advisory_raw` and `vex_raw` documents emitted by Concelier and Excititor, derives findings inside Policy-owned collections, and never mutates ingestion stores. Refer to the architecture overview and AOC reference for system-wide guardrails and provenance obligations.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Responsibilities & Constraints
|
||||
|
||||
- Compile and evaluate `stella-dsl@1` policy packs into deterministic verdicts.
|
||||
- Join SBOM inventory, Concelier advisories, and Excititor VEX evidence via canonical linksets and equivalence tables.
|
||||
- Materialise effective findings (`effective_finding_{policyId}`) with append-only history and produce explain traces.
|
||||
- Operate incrementally: react to change streams (advisory/vex/SBOM deltas) with ≤ 5 min SLA.
|
||||
- Provide simulations with diff summaries for UI/CLI workflows without modifying state.
|
||||
- Enforce strict determinism guard (no wall-clock, RNG, network beyond allow-listed services) and RBAC + tenancy via Authority scopes.
|
||||
- Support sealed/air-gapped deployments with offline bundles and sealed-mode hints.
|
||||
|
||||
Non-goals: policy authoring UI (handled by Console), ingestion or advisory normalisation (Concelier), VEX consensus (Excititor), runtime enforcement (Zastava).
|
||||
|
||||
---
|
||||
|
||||
## 2 · High-Level Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph Clients
|
||||
CLI[stella CLI]
|
||||
UI[Console Policy Editor]
|
||||
CI[CI Pipelines]
|
||||
end
|
||||
subgraph PolicyEngine["StellaOps.Policy.Engine"]
|
||||
API[Minimal API Host]
|
||||
Orchestrator[Run Orchestrator]
|
||||
WorkerPool[Evaluation Workers]
|
||||
Compiler[DSL Compiler Cache]
|
||||
Materializer[Effective Findings Writer]
|
||||
end
|
||||
subgraph RawStores["Raw Stores (AOC)"]
|
||||
AdvisoryRaw[(MongoDB<br/>advisory_raw)]
|
||||
VexRaw[(MongoDB<br/>vex_raw)]
|
||||
end
|
||||
subgraph Derived["Derived Stores"]
|
||||
Mongo[(MongoDB<br/>policies / policy_runs / effective_finding_*)]
|
||||
Blob[(Object Store / Evidence Locker)]
|
||||
Queue[(Mongo Queue / NATS)]
|
||||
end
|
||||
Concelier[(Concelier APIs)]
|
||||
Excititor[(Excititor APIs)]
|
||||
SBOM[(SBOM Service)]
|
||||
Authority[(Authority / DPoP Gateway)]
|
||||
|
||||
CLI --> API
|
||||
UI --> API
|
||||
CI --> API
|
||||
API --> Compiler
|
||||
API --> Orchestrator
|
||||
Orchestrator --> Queue
|
||||
Queue --> WorkerPool
|
||||
Concelier --> AdvisoryRaw
|
||||
Excititor --> VexRaw
|
||||
WorkerPool --> AdvisoryRaw
|
||||
WorkerPool --> VexRaw
|
||||
WorkerPool --> SBOM
|
||||
WorkerPool --> Materializer
|
||||
Materializer --> Mongo
|
||||
WorkerPool --> Blob
|
||||
API --> Mongo
|
||||
API --> Blob
|
||||
API --> Authority
|
||||
Orchestrator --> Mongo
|
||||
Authority --> API
|
||||
```
|
||||
|
||||
Key notes:
|
||||
|
||||
- API host exposes lifecycle, run, simulate, findings endpoints with DPoP-bound OAuth enforcement.
|
||||
- Orchestrator manages run scheduling/fairness; writes run tickets to queue, leases jobs to worker pool.
|
||||
- Workers evaluate policies using cached IR; join external services via tenant-scoped clients; pull immutable advisories/VEX from the raw stores; write derived overlays to Mongo and optional explain bundles to blob storage.
|
||||
- Observability (metrics/traces/logs) integrated via OpenTelemetry (not shown).
|
||||
|
||||
---
|
||||
|
||||
### 2.1 · AOC inputs & immutability
|
||||
|
||||
- **Raw-only reads.** Evaluation workers access `advisory_raw` / `vex_raw` via tenant-scoped Mongo clients or the Concelier/Excititor raw APIs. No Policy Engine component is permitted to mutate these collections.
|
||||
- **Guarded ingestion.** `AOCWriteGuard` rejects forbidden fields before data reaches the raw stores. Policy tests replay known `ERR_AOC_00x` violations to confirm ingestion compliance.
|
||||
- **Change streams as contract.** Run orchestration stores resumable cursors for raw change streams. Replays of these cursors (e.g., after failover) must yield identical materialisation outcomes.
|
||||
- **Derived stores only.** All severity, consensus, and suppression state lives in `effective_finding_*` collections and explain bundles owned by Policy Engine. Provenance fields link back to raw document IDs so auditors can trace every verdict.
|
||||
- **Authority scopes.** Only the Policy Engine service identity holds `effective:write`. Ingestion identities retain `advisory:*`/`vex:*` scopes, ensuring separation of duties enforced by Authority and the API Gateway.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Module Breakdown
|
||||
|
||||
| Module | Responsibility | Notes |
|
||||
|--------|----------------|-------|
|
||||
| **Configuration** (`Configuration/`) | Bind settings (Mongo URIs, queue options, service URLs, sealed mode), validate on start. | Strict schema; fails fast on missing secrets. |
|
||||
| **Authority Client** (`Authority/`) | Acquire tokens, enforce scopes, perform DPoP key rotation. | Only service identity uses `effective:write`. |
|
||||
| **DSL Compiler** (`Dsl/`) | Parse, canonicalise, IR generation, checksum caching. | Uses Roslyn-like pipeline; caches by `policyId+version+hash`. |
|
||||
| **Selection Layer** (`Selection/`) | Batch SBOM ↔ advisory ↔ VEX joiners; apply equivalence tables; support incremental cursors. | Deterministic ordering (SBOM → advisory → VEX). |
|
||||
| **Evaluator** (`Evaluation/`) | Execute IR with first-match semantics, compute severity/trust/reachability weights, record rule hits. | Stateless; all inputs provided by selection layer. |
|
||||
| **Materialiser** (`Materialization/`) | Upsert effective findings, append history, manage explain bundle exports. | Mongo transactions per SBOM chunk. |
|
||||
| **Orchestrator** (`Runs/`) | Change-stream ingestion, fairness, retry/backoff, queue writer. | Works with Scheduler Models DTOs. |
|
||||
| **API** (`Api/`) | Minimal API endpoints, DTO validation, problem responses, idempotency. | Generated clients for CLI/UI. |
|
||||
| **Observability** (`Telemetry/`) | Metrics (`policy_run_seconds`, `rules_fired_total`), traces, structured logs. | Sampled rule-hit logs with redaction. |
|
||||
| **Offline Adapter** (`Offline/`) | Bundle export/import (policies, simulations, runs), sealed-mode enforcement. | Uses DSSE signing via Signer service. |
|
||||
|
||||
---
|
||||
|
||||
## 4 · Data Model & Persistence
|
||||
|
||||
### 4.1 Collections
|
||||
|
||||
- `policies` – policy versions, metadata, lifecycle states, simulation artefact references.
|
||||
- `policy_runs` – run records, inputs (cursors, env), stats, determinism hash, run status.
|
||||
- `policy_run_events` – append-only log (queued, leased, completed, failed, canceled, replay).
|
||||
- `effective_finding_{policyId}` – current verdict snapshot per finding.
|
||||
- `effective_finding_{policyId}_history` – append-only history (previous verdicts, timestamps, runId).
|
||||
- `policy_reviews` – review comments/decisions.
|
||||
|
||||
### 4.2 Schema Highlights
|
||||
|
||||
- Run records include `changeDigests` (hash of advisory/VEX inputs) for replay verification.
|
||||
- Effective findings store provenance references (`advisory_raw_ids`, `vex_raw_ids`, `sbom_component_id`).
|
||||
- All collections include `tenant`, `policyId`, `version`, `createdAt`, `updatedAt`, `traceId` for audit.
|
||||
|
||||
### 4.3 Indexing
|
||||
|
||||
- Compound indexes: `{tenant, policyId, status}` on `policies`; `{tenant, policyId, status, startedAt}` on `policy_runs`; `{policyId, sbomId, findingKey}` on findings.
|
||||
- TTL indexes on transient explain bundle references (configurable).
|
||||
|
||||
---
|
||||
|
||||
## 5 · Evaluation Pipeline
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant Worker as EvaluationWorker
|
||||
participant Compiler as CompilerCache
|
||||
participant Selector as SelectionLayer
|
||||
participant Eval as Evaluator
|
||||
participant Mat as Materialiser
|
||||
participant Expl as ExplainStore
|
||||
|
||||
Worker->>Compiler: Load IR (policyId, version, digest)
|
||||
Compiler-->>Worker: CompiledPolicy (cached or compiled)
|
||||
Worker->>Selector: Fetch tuple batches (sbom, advisory, vex)
|
||||
Selector-->>Worker: Deterministic batches (1024 tuples)
|
||||
loop For each batch
|
||||
Worker->>Eval: Execute rules (batch, env)
|
||||
Eval-->>Worker: Verdicts + rule hits
|
||||
Worker->>Mat: Upsert effective findings
|
||||
Mat-->>Worker: Success
|
||||
Worker->>Expl: Persist sampled explain traces (optional)
|
||||
end
|
||||
Worker->>Mat: Append history + run stats
|
||||
Worker-->>Worker: Compute determinism hash
|
||||
Worker->>+Mat: Finalize transaction
|
||||
Mat-->>Worker: Ack
|
||||
```
|
||||
|
||||
Determinism guard instrumentation wraps the evaluator, rejecting access to forbidden APIs and ensuring batch ordering remains stable.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Run Orchestration & Incremental Flow
|
||||
|
||||
- **Change streams:** Concelier and Excititor publish document changes to the scheduler queue (`policy.trigger.delta`). Payload includes `tenant`, `source`, `linkset digests`, `cursor`.
|
||||
- **Orchestrator:** Maintains per-tenant backlog; merges deltas until time/size thresholds met, then enqueues `PolicyRunRequest`.
|
||||
- **Queue:** Mongo queue with lease; each job assigned `leaseDuration`, `maxAttempts`.
|
||||
- **Workers:** Lease jobs, execute evaluation pipeline, report status (success/failure/canceled). Failures with recoverable errors requeue with backoff; determinism or schema violations mark job `failed` and raise incident event.
|
||||
- **Fairness:** Round-robin per `{tenant, policyId}`; emergency jobs (`priority=emergency`) jump queue but limited via circuit breaker.
|
||||
- **Replay:** On demand, orchestrator rehydrates run via stored cursors and exports sealed bundle for audit/CI determinism checks.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Security & Tenancy
|
||||
|
||||
- **Auth:** All API calls pass through Authority gateway; DPoP tokens enforced for service-to-service (Policy Engine service principal). CLI/UI tokens include scope claims.
|
||||
- **Scopes:** Mutations require `policy:*` scopes corresponding to action; `effective:write` restricted to service identity.
|
||||
- **Tenancy:** All queries filter by `tenant`. Service identity uses `tenant-global` for shared policies; cross-tenant reads prohibited unless `policy:tenant-admin` scope present.
|
||||
- **Secrets:** Configuration loaded via environment variables or sealed secrets; runtime avoids writing secrets to logs.
|
||||
- **Determinism guard:** Static analyzer prevents referencing forbidden namespaces; runtime guard intercepts `DateTime.Now`, `Random`, `Guid`, HTTP clients beyond allow-list.
|
||||
- **Sealed mode:** Global flag disables outbound network except allow-listed internal hosts; watchers fail fast if unexpected egress attempted.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Observability
|
||||
|
||||
- Metrics:
|
||||
- `policy_run_seconds{mode,tenant,policy}` (histogram)
|
||||
- `policy_run_queue_depth{tenant}`
|
||||
- `policy_rules_fired_total{policy,rule}`
|
||||
- `policy_vex_overrides_total{policy,vendor}`
|
||||
- Logs: Structured JSON with `traceId`, `policyId`, `version`, `runId`, `tenant`, `phase`. Guard ensures no sensitive data leakage.
|
||||
- Traces: Spans `policy.select`, `policy.evaluate`, `policy.materialize`, `policy.simulate`. Trace IDs surfaced to CLI/UI.
|
||||
- Incident mode toggles 100 % sampling and extended retention windows.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Offline / Bundle Integration
|
||||
|
||||
- **Imports:** Offline Kit delivers policy packs, advisory/VEX snapshots, SBOM updates. Policy Engine ingests bundles via `offline import`.
|
||||
- **Exports:** `stella policy bundle export` packages policy, IR digest, simulations, run metadata; UI provides export triggers.
|
||||
- **Sealed hints:** Explain traces annotate when cached values used (EPSS, KEV). Run records mark `env.sealed=true`.
|
||||
- **Sync cadence:** Operators perform monthly bundle sync; Policy Engine warns when snapshots > configured staleness (default 14 days).
|
||||
|
||||
---
|
||||
|
||||
## 10 · Testing & Quality
|
||||
|
||||
- **Unit tests:** DSL parsing, evaluator semantics, guard enforcement.
|
||||
- **Integration tests:** Joiners with sample SBOM/advisory/VEX data; materialisation with deterministic ordering; API contract tests generated from OpenAPI.
|
||||
- **Property tests:** Ensure rule evaluation deterministic across permutations.
|
||||
- **Golden tests:** Replay recorded runs, compare determinism hash.
|
||||
- **Performance tests:** Evaluate 100k component / 1M advisory dataset under warmed caches (<30 s full run).
|
||||
- **Chaos hooks:** Optional toggles to simulate upstream latency/failures; used in staging.
|
||||
|
||||
---
|
||||
|
||||
## 11 · Compliance Checklist
|
||||
|
||||
- [ ] **Determinism guard enforced:** Static analyzer + runtime guard block wall-clock, RNG, unauthorized network calls.
|
||||
- [ ] **Incremental correctness:** Change-stream cursors stored and replayed during tests; unit/integration coverage for dedupe.
|
||||
- [ ] **RBAC validated:** Endpoint scope requirements match Authority configuration; integration tests cover deny/allow.
|
||||
- [ ] **AOC separation enforced:** No code path writes to `advisory_raw` / `vex_raw`; integration tests capture `ERR_AOC_00x` handling; read-only clients verified.
|
||||
- [ ] **Effective findings ownership:** Only Policy Engine identity holds `effective:write`; unauthorized callers receive `ERR_AOC_006`.
|
||||
- [ ] **Observability wired:** Metrics/traces/logs exported with correlation IDs; dashboards include `aoc_violation_total` and ingest latency panels.
|
||||
- [ ] **Offline parity:** Sealed-mode tests executed; bundle import/export flows documented and validated.
|
||||
- [ ] **Schema docs synced:** DTOs match Scheduler Models (`SCHED-MODELS-20-001`); JSON schemas committed.
|
||||
- [ ] **Security reviews complete:** Threat model (including queue poisoning, determinism bypass, data exfiltration) documented; mitigations in place.
|
||||
- [ ] **Disaster recovery rehearsed:** Run replay+rollback procedures tested and recorded.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
@@ -8,7 +8,9 @@
|
||||
| `DOCKER_HOST` | How containers reach your Docker daemon (because we no longer mount `/var/run/docker.sock`) | `tcp://docker:2375` |
|
||||
| `WORKSPACE` | Directory where the pipeline stores artefacts (SBOM file) | `$(pwd)` |
|
||||
| `IMAGE` | The image you are building & scanning | `acme/backend:sha-${COMMIT_SHA}` |
|
||||
| `SBOM_FILE` | Immutable SBOM name – `<image-ref>‑YYYYMMDDThhmmssZ.sbom.json` | `acme_backend_sha‑abc123‑20250804T153050Z.sbom.json` |
|
||||
| `SBOM_FILE` | Immutable SBOM name – `<image-ref>‑YYYYMMDDThhmmssZ.sbom.json` | `acme_backend_sha‑abc123‑20250804T153050Z.sbom.json` |
|
||||
|
||||
> **Authority graph scopes note (2025‑10‑27):** CI stages that spin up the Authority compose profile now rely on the checked-in `etc/authority.yaml`. Before running integration smoke jobs, inject real secrets for every `etc/secrets/*.secret` file (Cartographer, Graph API, Policy Engine, Concelier, Excititor). The repository defaults contain `*-change-me` placeholders and Authority will reject tokens if those secrets are not overridden.
|
||||
|
||||
```bash
|
||||
export STELLA_URL="stella-ops.ci.acme.example"
|
||||
@@ -291,6 +293,40 @@ Host the resulting bundle via any static file server for review (for example `py
|
||||
- [ ] Markdown link check (`npx markdown-link-check`) reports no broken references.
|
||||
- [ ] Preview bundle archived (or attached) for stakeholders.
|
||||
|
||||
### 4.5 Policy DSL lint stage
|
||||
|
||||
Policy Engine v2 pipelines now fail fast if policy documents are malformed. After checkout and dotnet restore, run:
|
||||
|
||||
```bash
|
||||
dotnet run \
|
||||
--project tools/PolicyDslValidator/PolicyDslValidator.csproj \
|
||||
-- \
|
||||
--strict docs/examples/policies/*.yaml
|
||||
```
|
||||
|
||||
- `--strict` treats warnings as errors so missing metadata doesn’t slip through.
|
||||
- The validator accepts globs, so you can point it at tenant policy directories later (`policies/**/*.yaml`).
|
||||
- Exit codes follow UNIX conventions: `0` success, `1` parse/errors, `2` warnings when `--strict` is set, `64` usage mistakes.
|
||||
|
||||
Capture the validator output as part of your build logs; Support uses it when triaging policy rollout issues.
|
||||
|
||||
### 4.6 Policy simulation smoke
|
||||
|
||||
Catch unexpected policy regressions by exercising a small set of golden SBOM findings via the simulation smoke tool:
|
||||
|
||||
```bash
|
||||
dotnet run \
|
||||
--project tools/PolicySimulationSmoke/PolicySimulationSmoke.csproj \
|
||||
-- \
|
||||
--scenario-root samples/policy/simulations \
|
||||
--output artifacts/policy-simulations
|
||||
```
|
||||
|
||||
- The tool loads each `scenario.json` under `samples/policy/simulations`, evaluates the referenced policy, and fails the build if projected verdicts change.
|
||||
- In CI the command runs twice (to `run1/` and `run2/`) and `diff -u` compares the summaries—any mismatch signals a determinism regression.
|
||||
- Artifacts land in `artifacts/policy-simulations/policy-simulation-summary.json`; upload them for later inspection (see CI workflow).
|
||||
- Expand scenarios by copying real-world findings into the samples directory—ensure expected statuses are recorded so regressions trip the pipeline.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Troubleshooting cheat‑sheet
|
||||
|
||||
284
docs/cli/cli-reference.md
Normal file
284
docs/cli/cli-reference.md
Normal file
@@ -0,0 +1,284 @@
|
||||
# CLI AOC Commands Reference
|
||||
|
||||
> **Audience:** DevEx engineers, operators, and CI authors integrating the `stella` CLI with Aggregation-Only Contract (AOC) workflows.
|
||||
> **Scope:** Command synopsis, options, exit codes, and offline considerations for `stella sources ingest --dry-run` and `stella aoc verify` as introduced in Sprint 19.
|
||||
|
||||
Both commands are designed to enforce the AOC guardrails documented in the [aggregation-only reference](../ingestion/aggregation-only-contract.md) and the [architecture overview](../architecture/overview.md). They consume Authority-issued tokens with tenant scopes and never mutate ingestion stores.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Prerequisites
|
||||
|
||||
- CLI version: `stella` ≥ 0.19.0 (AOC feature gate enabled).
|
||||
- Required scopes (DPoP-bound):
|
||||
- `advisory:verify` for Concelier sources.
|
||||
- `vex:verify` for Excititor sources (optional but required for VEX checks).
|
||||
- `tenant:select` if your deployment uses tenant switching.
|
||||
- Connectivity: direct access to Concelier/Excititor APIs or Offline Kit snapshot (see § 4).
|
||||
- Environment: set `STELLA_AUTHORITY_URL`, `STELLA_TENANT`, and export a valid OpTok via `stella auth login` or existing token cache.
|
||||
|
||||
---
|
||||
|
||||
## 2 · `stella sources ingest --dry-run`
|
||||
|
||||
### 2.1 Synopsis
|
||||
|
||||
```bash
|
||||
stella sources ingest --dry-run \
|
||||
--source <source-key> \
|
||||
--input <path-or-uri> \
|
||||
[--tenant <tenant-id>] \
|
||||
[--format json|table] \
|
||||
[--no-color] \
|
||||
[--output <file>]
|
||||
```
|
||||
|
||||
### 2.2 Description
|
||||
|
||||
Previews an ingestion write without touching MongoDB. The command loads an upstream advisory or VEX document, computes the would-write payload, runs it through the `AOCWriteGuard`, and reports any forbidden fields, provenance gaps, or idempotency issues. Use it during connector development, CI validation, or while triaging incidents.
|
||||
|
||||
### 2.3 Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--source <source-key>` | Logical source name (`redhat`, `ubuntu`, `osv`, etc.). Mirrors connector configuration. |
|
||||
| `--input <path-or-uri>` | Path to local CSAF/OSV/VEX file or HTTPS URI. CLI normalises transport (gzip/base64) before guard evaluation. |
|
||||
| `--tenant <tenant-id>` | Overrides default tenant for multi-tenant deployments. Mandatory when `STELLA_TENANT` is not set. |
|
||||
| `--format json|table` | Output format. `table` (default) prints summary with highlighted violations; `json` emits machine-readable report (see below). |
|
||||
| `--no-color` | Disables ANSI colour output for CI logs. |
|
||||
| `--output <file>` | Writes the JSON report to file while still printing human-readable summary to stdout. |
|
||||
|
||||
### 2.4 Output schema (JSON)
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "redhat",
|
||||
"tenant": "default",
|
||||
"guardVersion": "1.0.0",
|
||||
"status": "ok",
|
||||
"document": {
|
||||
"contentHash": "sha256:…",
|
||||
"supersedes": null,
|
||||
"provenance": {
|
||||
"signature": { "format": "pgp", "present": true }
|
||||
}
|
||||
},
|
||||
"violations": []
|
||||
}
|
||||
```
|
||||
|
||||
When violations exist, `status` becomes `error` and `violations` contains entries with `code` (`ERR_AOC_00x`), a short `message`, and JSON Pointer `path` values indicating offending fields.
|
||||
|
||||
### 2.5 Exit codes
|
||||
|
||||
| Exit code | Meaning |
|
||||
|-----------|---------|
|
||||
| `0` | Guard passed; would-write payload is AOC compliant. |
|
||||
| `11` | `ERR_AOC_001` – Forbidden field (`severity`, `cvss`, etc.) detected. |
|
||||
| `12` | `ERR_AOC_002` – Merge attempt (multiple upstream sources fused). |
|
||||
| `13` | `ERR_AOC_003` – Idempotency violation (duplicate without supersedes). |
|
||||
| `14` | `ERR_AOC_004` – Missing provenance fields. |
|
||||
| `15` | `ERR_AOC_005` – Signature/checksum mismatch. |
|
||||
| `16` | `ERR_AOC_006` – Effective findings present (Policy-only data). |
|
||||
| `17` | `ERR_AOC_007` – Unknown top-level fields / schema violation. |
|
||||
| `70` | Transport error (network, auth, malformed input). |
|
||||
|
||||
> Exit codes map directly to the `ERR_AOC_00x` table for scripting consistency. Multiple violations yield the highest-priority code (e.g., 11 takes precedence over 14).
|
||||
|
||||
### 2.6 Examples
|
||||
|
||||
Dry-run a local CSAF file:
|
||||
|
||||
```bash
|
||||
stella sources ingest --dry-run \
|
||||
--source redhat \
|
||||
--input ./fixtures/redhat/RHSA-2025-1234.json
|
||||
```
|
||||
|
||||
Stream from HTTPS and emit JSON for CI:
|
||||
|
||||
```bash
|
||||
stella sources ingest --dry-run \
|
||||
--source osv \
|
||||
--input https://osv.dev/vulnerability/GHSA-aaaa-bbbb \
|
||||
--format json \
|
||||
--output artifacts/osv-dry-run.json
|
||||
|
||||
cat artifacts/osv-dry-run.json | jq '.violations'
|
||||
```
|
||||
|
||||
### 2.7 Offline notes
|
||||
|
||||
When operating in sealed/offline mode:
|
||||
|
||||
- Use `--input` paths pointing to Offline Kit snapshots (`offline-kit/advisories/*.json`).
|
||||
- Provide `--tenant` explicitly if the offline bundle contains multiple tenants.
|
||||
- The command does not attempt network access when given a file path.
|
||||
- Store reports with `--output` to include in transfer packages for policy review.
|
||||
|
||||
---
|
||||
|
||||
## 3 · `stella aoc verify`
|
||||
|
||||
### 3.1 Synopsis
|
||||
|
||||
```bash
|
||||
stella aoc verify \
|
||||
[--since <iso8601|duration>] \
|
||||
[--limit <count>] \
|
||||
[--sources <list>] \
|
||||
[--codes <ERR_AOC_00x,...>] \
|
||||
[--format table|json] \
|
||||
[--export <file>] \
|
||||
[--tenant <tenant-id>] \
|
||||
[--no-color]
|
||||
```
|
||||
|
||||
### 3.2 Description
|
||||
|
||||
Replays the AOC guard against stored raw documents. By default it checks all advisories and VEX statements ingested in the last 24 hours for the active tenant, reporting totals, top violation codes, and sample documents. Use it in CI pipelines, scheduled verifications, or during incident response.
|
||||
|
||||
### 3.3 Options
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--since <value>` | Verification window. Accepts ISO 8601 timestamp (`2025-10-25T12:00:00Z`) or duration (`48h`, `7d`). Defaults to `24h`. |
|
||||
| `--limit <count>` | Maximum number of violations to display (per code). `0` means show all. Defaults to `20`. |
|
||||
| `--sources <list>` | Comma-separated list of sources (`redhat,ubuntu,osv`). Filters both advisories and VEX entries. |
|
||||
| `--codes <list>` | Restricts output to specific `ERR_AOC_00x` codes. Useful for regression tracking. |
|
||||
| `--format table|json` | `table` (default) prints summary plus top violations; `json` outputs machine-readable report identical to the `/aoc/verify` API. |
|
||||
| `--export <file>` | Writes the JSON report to disk (useful for audits/offline uploads). |
|
||||
| `--tenant <tenant-id>` | Overrides tenant context. Required for cross-tenant verifications when run by platform operators. |
|
||||
| `--no-color` | Disables ANSI colours. |
|
||||
|
||||
### 3.4 Report structure (JSON)
|
||||
|
||||
```json
|
||||
{
|
||||
"tenant": "default",
|
||||
"window": {
|
||||
"from": "2025-10-25T12:00:00Z",
|
||||
"to": "2025-10-26T12:00:00Z"
|
||||
},
|
||||
"checked": {
|
||||
"advisories": 482,
|
||||
"vex": 75
|
||||
},
|
||||
"violations": [
|
||||
{
|
||||
"code": "ERR_AOC_001",
|
||||
"count": 2,
|
||||
"examples": [
|
||||
{
|
||||
"source": "redhat",
|
||||
"documentId": "advisory_raw:redhat:RHSA-2025:1",
|
||||
"contentHash": "sha256:…",
|
||||
"path": "/content/raw/cvss"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"metrics": {
|
||||
"ingestion_write_total": 557,
|
||||
"aoc_violation_total": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.5 Exit codes
|
||||
|
||||
| Exit code | Meaning |
|
||||
|-----------|---------|
|
||||
| `0` | Verification succeeded with zero violations. |
|
||||
| `11…17` | Same mapping as § 2.5 when violations are detected. Highest-priority code returned. |
|
||||
| `18` | Verification ran but results truncated (limit reached) – treat as warning; rerun with higher `--limit`. |
|
||||
| `70` | Transport/authentication error. |
|
||||
| `71` | CLI misconfiguration (missing tenant, invalid `--since`, etc.). |
|
||||
|
||||
### 3.6 Examples
|
||||
|
||||
Daily verification across all sources:
|
||||
|
||||
```bash
|
||||
stella aoc verify --since 24h --format table
|
||||
```
|
||||
|
||||
CI pipeline focusing on errant sources and exporting evidence:
|
||||
|
||||
```bash
|
||||
stella aoc verify \
|
||||
--sources redhat,ubuntu \
|
||||
--codes ERR_AOC_001,ERR_AOC_004 \
|
||||
--format json \
|
||||
--limit 100 \
|
||||
--export artifacts/aoc-verify.json
|
||||
|
||||
jq '.violations[] | {code, count}' artifacts/aoc-verify.json
|
||||
```
|
||||
|
||||
Air-gapped verification using Offline Kit snapshot (example script):
|
||||
|
||||
```bash
|
||||
stella aoc verify \
|
||||
--since 7d \
|
||||
--format json \
|
||||
--export /mnt/offline/aoc-verify-$(date +%F).json
|
||||
|
||||
sha256sum /mnt/offline/aoc-verify-*.json > /mnt/offline/checksums.txt
|
||||
```
|
||||
|
||||
### 3.7 Automation tips
|
||||
|
||||
- Schedule with `cron` or platform scheduler and fail the job when exit code ≥ 11.
|
||||
- Pair with `stella sources ingest --dry-run` for pre-flight validation before re-enabling a paused source.
|
||||
- Push JSON exports to observability pipelines for historical tracking of violation counts.
|
||||
|
||||
### 3.8 Offline notes
|
||||
|
||||
- Works against Offline Kit Mongo snapshots when CLI is pointed at the local API gateway included in the bundle.
|
||||
- When fully disconnected, run against exported `aoc verify` reports generated on production and replay them using `--format json --export` (automation recipe above).
|
||||
- Include verification output in compliance packages alongside Offline Kit manifests.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Global exit-code reference
|
||||
|
||||
| Code | Summary |
|
||||
|------|---------|
|
||||
| `0` | Success / no violations. |
|
||||
| `11` | `ERR_AOC_001` – Forbidden field present. |
|
||||
| `12` | `ERR_AOC_002` – Merge attempt detected. |
|
||||
| `13` | `ERR_AOC_003` – Idempotency violation. |
|
||||
| `14` | `ERR_AOC_004` – Missing provenance/signature metadata. |
|
||||
| `15` | `ERR_AOC_005` – Signature/checksum mismatch. |
|
||||
| `16` | `ERR_AOC_006` – Effective findings in ingestion payload. |
|
||||
| `17` | `ERR_AOC_007` – Schema violation / unknown fields. |
|
||||
| `18` | Partial verification (limit reached). |
|
||||
| `70` | Transport or HTTP failure. |
|
||||
| `71` | CLI usage error (invalid arguments, missing tenant). |
|
||||
|
||||
Use these codes in CI to map outcomes to build statuses or alert severities.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Related references
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Architecture overview](../architecture/overview.md)
|
||||
- [Console AOC dashboard](../ui/console.md)
|
||||
- [Authority scopes](../ARCHITECTURE_AUTHORITY.md)
|
||||
|
||||
---
|
||||
|
||||
## 6 · Compliance checklist
|
||||
|
||||
- [ ] Usage documented for both table and JSON formats.
|
||||
- [ ] Exit-code mapping matches `ERR_AOC_00x` definitions and automation guidance.
|
||||
- [ ] Offline/air-gap workflow captured for both commands.
|
||||
- [ ] References to AOC architecture and console docs included.
|
||||
- [ ] Examples validated against current CLI syntax (update post-implementation).
|
||||
- [ ] Docs guild screenshot/narrative placeholder logged for release notes (pending CLI team capture).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
284
docs/cli/policy.md
Normal file
284
docs/cli/policy.md
Normal file
@@ -0,0 +1,284 @@
|
||||
# Stella CLI — Policy Commands
|
||||
|
||||
> **Audience:** Policy authors, reviewers, operators, and CI engineers using the `stella` CLI to interact with Policy Engine.
|
||||
> **Supported from:** `stella` CLI ≥ 0.20.0 (Policy Engine v2 sprint line).
|
||||
> **Prerequisites:** Authority-issued bearer token with the scopes noted per command (export `STELLA_TOKEN` or pass `--token`).
|
||||
|
||||
---
|
||||
|
||||
## 1 · Global Options & Output Modes
|
||||
|
||||
All `stella policy *` commands honour the common CLI options:
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--server <url>` | `https://stella.local` | Policy Engine gateway root. |
|
||||
| `--tenant <id>` | token default | Override tenant for multi-tenant installs. |
|
||||
| `--format <table\|json\|yaml>` | `table` for TTY, `json` otherwise | Output format for listings/diffs. |
|
||||
| `--output <file>` | stdout | Write full JSON payload to file. |
|
||||
| `--sealed` | false | Force sealed-mode behaviour (no outbound fetch). |
|
||||
| `--trace` | false | Emit verbose timing/log correlation info. |
|
||||
|
||||
> **Tip:** Set `STELLA_PROFILE=policy` in CI to load saved defaults from `~/.stella/profiles/policy.toml`.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Authoring & Drafting Commands
|
||||
|
||||
### 2.1 `stella policy new`
|
||||
|
||||
Create a draft policy from a template or scratch.
|
||||
|
||||
```
|
||||
stella policy new --policy-id P-7 --name "Default Org Policy" \
|
||||
--template baseline --output-path policies/P-7.stella
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--policy-id` *(required)* | Stable identifier (e.g., `P-7`). |
|
||||
| `--name` | Friendly display name. |
|
||||
| `--template` | `baseline`, `serverless`, `blank`. |
|
||||
| `--from` | Start from existing version (`policyId@version`). |
|
||||
| `--open` | Launches `$EDITOR` after creation. |
|
||||
|
||||
Writes DSL to local file and registers draft version (`status=draft`). Requires `policy:write`.
|
||||
|
||||
### 2.2 `stella policy edit`
|
||||
|
||||
Open an existing draft in the local editor.
|
||||
|
||||
```
|
||||
stella policy edit P-7 --version 4
|
||||
```
|
||||
|
||||
- Auto-checks out latest draft if `--version` omitted.
|
||||
- Saves to temp file, uploads on editor exit (unless `--no-upload`).
|
||||
- Use `--watch` to keep command alive and re-upload on every save.
|
||||
|
||||
### 2.3 `stella policy lint`
|
||||
|
||||
Static validation without submitting.
|
||||
|
||||
```
|
||||
stella policy lint policies/P-7.stella --format json
|
||||
```
|
||||
|
||||
Outputs diagnostics (line/column, code, message). Exit codes:
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| `0` | No lint errors. |
|
||||
| `10` | Syntax/compile errors (`ERR_POL_001`). |
|
||||
| `11` | Unsupported syntax version. |
|
||||
|
||||
### 2.4 `stella policy compile`
|
||||
|
||||
Emits IR digest and rule summary.
|
||||
|
||||
```
|
||||
stella policy compile P-7 --version 4
|
||||
```
|
||||
|
||||
Returns JSON with `digest`, `rules.count`, action counts. Exit `0` success, `10` on compile errors.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Lifecycle Workflow
|
||||
|
||||
### 3.1 Submit
|
||||
|
||||
```
|
||||
stella policy submit P-7 --version 4 \
|
||||
--reviewer user:kay --reviewer group:sec-reviewers \
|
||||
--note "Simulated against golden SBOM set" \
|
||||
--attach sims/P-7-v4-vs-v3.json
|
||||
```
|
||||
|
||||
Requires `policy:submit`. CLI validates that lint/compile run within 24 h and bundle attachments exist.
|
||||
|
||||
### 3.2 Review
|
||||
|
||||
```
|
||||
stella policy review P-7 --version 4 --approve \
|
||||
--note "Looks good; ensure incident playbook updated."
|
||||
```
|
||||
|
||||
- `--approve`, `--request-changes`, or `--comment`.
|
||||
- Provide `--blocking` to mark comment as blocking.
|
||||
- Requires `policy:review`.
|
||||
|
||||
### 3.3 Approve
|
||||
|
||||
```
|
||||
stella policy approve P-7 --version 4 \
|
||||
--note "Determinism CI green; simulation diff attached." \
|
||||
--attach sims/P-7-v4-vs-v3.json
|
||||
```
|
||||
|
||||
Prompts for confirmation; refuses if approver == submitter. Requires `policy:approve`.
|
||||
|
||||
### 3.4 Activate
|
||||
|
||||
```
|
||||
stella policy activate P-7 --version 4 --run-now --priority high
|
||||
```
|
||||
|
||||
- Optional `--scheduled-at 2025-10-27T02:00:00Z`.
|
||||
- Requires `policy:activate` and `policy:run`.
|
||||
|
||||
### 3.5 Archive / Rollback
|
||||
|
||||
```
|
||||
stella policy archive P-7 --version 3 --reason "Superseded by v4"
|
||||
stella policy activate P-7 --version 3 --rollback --incident INC-2025-104
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4 · Simulation & Runs
|
||||
|
||||
### 4.1 Simulate
|
||||
|
||||
```
|
||||
stella policy simulate P-7 \
|
||||
--base 3 --candidate 4 \
|
||||
--sbom sbom:S-42 --sbom sbom:S-318 \
|
||||
--env exposure=internet --env sealed=false \
|
||||
--format json --output sims/P-7-v4-vs-v3.json
|
||||
```
|
||||
|
||||
Output fields (JSON):
|
||||
|
||||
```json
|
||||
{
|
||||
"diff": {
|
||||
"added": 12,
|
||||
"removed": 8,
|
||||
"unchanged": 657,
|
||||
"bySeverity": {
|
||||
"Critical": {"up": 1, "down": 0},
|
||||
"High": {"up": 3, "down": 4}
|
||||
}
|
||||
},
|
||||
"explainUri": "blob://policy/P-7/simulations/2025-10-26.json"
|
||||
}
|
||||
```
|
||||
|
||||
> Schema reminder: CLI commands surface objects defined in `src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-20-001-POLICY-RUNS.md`; use the samples in `samples/api/scheduler/` for contract validation when extending output parsing.
|
||||
|
||||
Exit codes:
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| `0` | Simulation succeeded; diffs informational. |
|
||||
| `20` | Blocking delta (`--fail-on-diff` triggered). |
|
||||
| `21` | Simulation input missing (`ERR_POL_003`). |
|
||||
| `22` | Determinism guard (`ERR_POL_004`). |
|
||||
| `23` | API/permission error (`ERR_POL_002`, `ERR_POL_005`). |
|
||||
|
||||
### 4.2 Run
|
||||
|
||||
```
|
||||
stella policy run P-7 --mode full \
|
||||
--sbom sbom:S-42 --env exposure=internal-only \
|
||||
--wait --watch
|
||||
```
|
||||
|
||||
Options:
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--mode` | `full` or `incremental` (default incremental). |
|
||||
| `--sbom` | Explicit SBOM IDs (optional). |
|
||||
| `--priority` | `normal`, `high`, `emergency`. |
|
||||
| `--wait` | Poll run status until completion. |
|
||||
| `--watch` | Stream progress events (requires TTY). |
|
||||
|
||||
`stella policy run status <runId>` retrieves run metadata.
|
||||
`stella policy run list --status failed --limit 20` returns recent runs.
|
||||
|
||||
### 4.3 Replay & Cancel
|
||||
|
||||
```
|
||||
stella policy run replay run:P-7:2025-10-26:auto --output bundles/replay.tgz
|
||||
stella policy run cancel run:P-7:2025-10-26:auto
|
||||
```
|
||||
|
||||
Replay downloads sealed bundle for deterministic verification.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Findings & Explainability
|
||||
|
||||
### 5.1 List Findings
|
||||
|
||||
```
|
||||
stella findings ls --policy P-7 \
|
||||
--sbom sbom:S-42 \
|
||||
--status affected --severity High,Critical \
|
||||
--format table
|
||||
```
|
||||
|
||||
Common flags:
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--page`, `--page-size` | Pagination (default page size 50). |
|
||||
| `--cursor` | Use cursor token from previous call. |
|
||||
| `--since` | ISO timestamp filter. |
|
||||
|
||||
### 5.2 Fetch Explain
|
||||
|
||||
```
|
||||
stella findings explain --policy P-7 --finding P-7:S-42:pkg:npm/lodash@4.17.21:CVE-2021-23337 \
|
||||
--format json --output explains/lodash.json
|
||||
```
|
||||
|
||||
Outputs ordered rule hits, inputs, and sealed-mode hints.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Exit Codes Summary
|
||||
|
||||
| Exit code | Description | Typical ERR codes |
|
||||
|-----------|-------------|-------------------|
|
||||
| `0` | Success (command completed, warnings only). | — |
|
||||
| `10` | DSL syntax/compile failure. | `ERR_POL_001` |
|
||||
| `11` | Unsupported DSL version / schema mismatch. | `ERR_POL_001` |
|
||||
| `12` | Approval/rbac failure. | `ERR_POL_002`, `ERR_POL_005` |
|
||||
| `20` | Simulation diff exceeded thresholds (`--fail-on-diff`). | — |
|
||||
| `21` | Required inputs missing (SBOM/advisory/VEX). | `ERR_POL_003` |
|
||||
| `22` | Determinism guard triggered. | `ERR_POL_004` |
|
||||
| `23` | Run canceled or timed out. | `ERR_POL_006` |
|
||||
| `30` | Network/transport error (non-HTTP success). | — |
|
||||
| `64` | CLI usage error (invalid flag/argument). | — |
|
||||
|
||||
All non-zero exits emit structured error envelope on stderr when `--format json` or `STELLA_JSON_ERRORS=1`.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Offline & Air-Gap Usage
|
||||
|
||||
- Use `--sealed` to ensure commands avoid outbound calls; required for sealed enclaves.
|
||||
- `stella policy bundle export --policy P-7 --version 4 --output bundles/policy-P-7-v4.bundle` pairs with Offline Kit import.
|
||||
- Replay bundles (`run replay`) are DSSE-signed; verify with `stella offline verify`.
|
||||
- Store credentials in `~/.stella/offline.toml` for non-interactive air-gapped pipelines.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Compliance Checklist
|
||||
|
||||
- [ ] **Help text synced:** `stella policy --help` matches documented flags/examples (update during release pipeline).
|
||||
- [ ] **Exit codes mapped:** Table above reflects CLI implementation and CI asserts mapping for `ERR_POL_*`.
|
||||
- [ ] **JSON schemas verified:** Example payloads validated against OpenAPI/SDK contracts before publishing.
|
||||
- [ ] **Scope guidance present:** Each command lists required Authority scopes.
|
||||
- [ ] **Offline guidance included:** Sealed-mode steps and bundle workflows documented.
|
||||
- [ ] **Cross-links tested:** Links to DSL, lifecycle, runs, and API docs resolve locally (`yarn docs:lint`).
|
||||
- [ ] **Examples no-op safe:** Command examples either read-only or use placeholders (no destructive defaults).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
229
docs/deploy/console.md
Normal file
229
docs/deploy/console.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Deploying the StellaOps Console
|
||||
|
||||
> **Audience:** Deployment Guild, Console Guild, operators rolling out the web console.
|
||||
> **Scope:** Helm and Docker Compose deployment steps, ingress/TLS configuration, required environment variables, health checks, offline/air-gap operation, and compliance checklist (Sprint 23).
|
||||
|
||||
The StellaOps Console ships as part of the `stellaops` stack Helm chart and Compose bundles maintained under `deploy/`. This guide describes the supported deployment paths, the configuration surface, and operational checks needed to run the console in connected or air-gapped environments.
|
||||
|
||||
---
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
- Kubernetes cluster (v1.28+) with ingress controller (NGINX, Traefik, or equivalent) and Cert-Manager for automated TLS, or Docker host for Compose deployments.
|
||||
- Container registry access to `registry.stella-ops.org` (or mirrored registry) for all images listed in `deploy/releases/*.yaml`.
|
||||
- Authority service configured with console client (`aud=ui`, scopes `ui.read`, `ui.admin`).
|
||||
- DNS entry pointing to the console hostname (for example, `console.acme.internal`).
|
||||
- Cosign public key for manifest verification (`deploy/releases/manifest.json.sig`).
|
||||
- Optional: Offline Kit bundle for air-gapped sites (`stella-ops-offline-kit-<ver>.tar.gz`).
|
||||
|
||||
---
|
||||
|
||||
## 2. Helm deployment (recommended)
|
||||
|
||||
### 2.1 Install chart repository
|
||||
|
||||
```bash
|
||||
helm repo add stellaops https://downloads.stella-ops.org/helm
|
||||
helm repo update stellaops
|
||||
```
|
||||
|
||||
If operating offline, copy the chart archive from the Offline Kit (`deploy/helm/stellaops-<ver>.tgz`) and run:
|
||||
|
||||
```bash
|
||||
helm install stellaops ./stellaops-<ver>.tgz --namespace stellaops --create-namespace
|
||||
```
|
||||
|
||||
### 2.2 Base installation
|
||||
|
||||
```bash
|
||||
helm install stellaops stellaops/stellaops \
|
||||
--namespace stellaops \
|
||||
--create-namespace \
|
||||
--values deploy/helm/stellaops/values-prod.yaml
|
||||
```
|
||||
|
||||
The chart deploys Authority, Console web/API gateway, Scanner API, Scheduler, and supporting services. The console frontend pod is labelled `app=stellaops-web-ui`.
|
||||
|
||||
### 2.3 Helm values highlights
|
||||
|
||||
Key sections in `deploy/helm/stellaops/values-prod.yaml`:
|
||||
|
||||
| Path | Description |
|
||||
|------|-------------|
|
||||
| `console.ingress.host` | Hostname served by the console (`console.example.com`). |
|
||||
| `console.ingress.tls.secretName` | Kubernetes secret containing TLS certificate (generated by Cert-Manager or uploaded manually). |
|
||||
| `console.config.apiGateway.baseUrl` | Internal base URL the UI uses to reach the gateway (defaults to `https://stellaops-web`). |
|
||||
| `console.env.AUTHORITY_ISSUER` | Authority issuer URL (for example, `https://authority.example.com`). |
|
||||
| `console.env.AUTHORITY_CLIENT_ID` | Authority client ID for the console UI. |
|
||||
| `console.env.AUTHORITY_SCOPES` | Space-separated scopes required by UI (`ui.read ui.admin`). |
|
||||
| `console.resources` | CPU/memory requests and limits (default 250m CPU / 512Mi memory). |
|
||||
| `console.podAnnotations` | Optional annotations for service mesh or monitoring. |
|
||||
|
||||
Use `values-stage.yaml`, `values-dev.yaml`, or `values-airgap.yaml` as templates for other environments.
|
||||
|
||||
### 2.4 TLS and ingress
|
||||
|
||||
Example ingress override:
|
||||
|
||||
```yaml
|
||||
console:
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
host: console.acme.internal
|
||||
tls:
|
||||
enabled: true
|
||||
secretName: console-tls
|
||||
```
|
||||
|
||||
Generate certificates using Cert-Manager or provide an existing secret. For air-gapped deployments, pre-create the secret with the mirrored CA chain.
|
||||
|
||||
### 2.5 Health checks
|
||||
|
||||
Console pods expose:
|
||||
|
||||
| Path | Purpose | Notes |
|
||||
|------|---------|-------|
|
||||
| `/health/live` | Liveness probe | Confirms process responsive. |
|
||||
| `/health/ready` | Readiness probe | Verifies configuration bootstrap and Authority reachability. |
|
||||
| `/metrics` | Prometheus metrics | Enabled when `console.metrics.enabled=true`. |
|
||||
|
||||
Helm chart sets default probes (`initialDelaySeconds: 10`, `periodSeconds: 15`). Adjust via `console.livenessProbe` and `console.readinessProbe`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Docker Compose deployment
|
||||
|
||||
Located in `deploy/compose/docker-compose.console.yaml`. Quick start:
|
||||
|
||||
```bash
|
||||
cd deploy/compose
|
||||
docker compose -f docker-compose.console.yaml --env-file console.env up -d
|
||||
```
|
||||
|
||||
`console.env` should define:
|
||||
|
||||
```
|
||||
CONSOLE_PUBLIC_BASE_URL=https://console.acme.internal
|
||||
AUTHORITY_ISSUER=https://authority.acme.internal
|
||||
AUTHORITY_CLIENT_ID=console-ui
|
||||
AUTHORITY_CLIENT_SECRET=<if using confidential client>
|
||||
AUTHORITY_SCOPES=ui.read ui.admin
|
||||
CONSOLE_GATEWAY_BASE_URL=https://api.acme.internal
|
||||
```
|
||||
|
||||
The compose bundle includes Traefik as reverse proxy with TLS termination. Update `traefik/dynamic/console.yml` for custom certificates or additional middlewares (CSP headers, rate limits).
|
||||
|
||||
---
|
||||
|
||||
## 4. Environment variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `CONSOLE_PUBLIC_BASE_URL` | External URL used for redirects, deep links, and telemetry. | None (required). |
|
||||
| `CONSOLE_GATEWAY_BASE_URL` | URL of the web gateway that proxies API calls (`/console/*`). | Chart service name. |
|
||||
| `AUTHORITY_ISSUER` | Authority issuer (`https://authority.example.com`). | None (required). |
|
||||
| `AUTHORITY_CLIENT_ID` | OIDC client configured in Authority. | None (required). |
|
||||
| `AUTHORITY_SCOPES` | Space-separated scopes assigned to the console client. | `ui.read ui.admin`. |
|
||||
| `AUTHORITY_DPOP_ENABLED` | Enables DPoP challenge/response (recommended true). | `true`. |
|
||||
| `CONSOLE_FEATURE_FLAGS` | Comma-separated feature flags (`runs`, `downloads.offline`, etc.). | `runs,downloads,policies`. |
|
||||
| `CONSOLE_LOG_LEVEL` | Minimum log level (`Information`, `Debug`, etc.). | `Information`. |
|
||||
| `CONSOLE_METRICS_ENABLED` | Expose `/metrics` endpoint. | `true`. |
|
||||
| `CONSOLE_SENTRY_DSN` | Optional error reporting DSN. | Blank. |
|
||||
|
||||
When running behind additional proxies, set `ASPNETCORE_FORWARDEDHEADERS_ENABLED=true` to honour `X-Forwarded-*` headers.
|
||||
|
||||
---
|
||||
|
||||
## 5. Security headers and CSP
|
||||
|
||||
The console serves a strict Content Security Policy (CSP) by default:
|
||||
|
||||
```
|
||||
default-src 'self';
|
||||
connect-src 'self' https://*.stella-ops.local;
|
||||
script-src 'self';
|
||||
style-src 'self' 'unsafe-inline';
|
||||
img-src 'self' data:;
|
||||
font-src 'self';
|
||||
frame-ancestors 'none';
|
||||
```
|
||||
|
||||
Adjust via `console.config.cspOverrides` if additional domains are required. For integrations embedding the console, update OIDC redirect URIs and Authority scopes accordingly.
|
||||
|
||||
TLS recommendations:
|
||||
|
||||
- Use TLS 1.2+ with modern cipher suite policy.
|
||||
- Enable HSTS (`Strict-Transport-Security: max-age=31536000; includeSubDomains`).
|
||||
- Provide custom trust bundles via `console.config.trustBundleSecret` when using private CAs.
|
||||
|
||||
---
|
||||
|
||||
## 6. Logging and metrics
|
||||
|
||||
- Structured logs emitted to stdout with correlation IDs. Configure log shipping via Fluent Bit or similar.
|
||||
- Metrics available at `/metrics` in Prometheus format. Key metrics include `ui_request_duration_seconds`, `ui_tenant_switch_total`, and `ui_download_manifest_refresh_seconds`.
|
||||
- Enable OpenTelemetry exporter by setting `OTEL_EXPORTER_OTLP_ENDPOINT` and associated headers in environment variables.
|
||||
|
||||
---
|
||||
|
||||
## 7. Offline and air-gap deployment
|
||||
|
||||
- Mirror container images using the Downloads workspace or Offline Kit manifest. Example:
|
||||
|
||||
```bash
|
||||
oras copy registry.stella-ops.org/stellaops/web-ui@sha256:<digest> \
|
||||
registry.airgap.local/stellaops/web-ui:2025.10.0
|
||||
```
|
||||
|
||||
- Import Offline Kit using `stella ouk import` before starting the console so manifest parity checks succeed.
|
||||
- Use `values-airgap.yaml` to disable external telemetry endpoints and configure internal certificate chains.
|
||||
- Run `helm upgrade --install` using the mirrored chart (`stellaops-<ver>.tgz`) and set `console.offlineMode=true` to surface offline banners.
|
||||
|
||||
---
|
||||
|
||||
## 8. Health checks and remediation
|
||||
|
||||
| Check | Command | Expected result |
|
||||
|-------|---------|-----------------|
|
||||
| Pod status | `kubectl get pods -n stellaops` | `Running` state with restarts = 0. |
|
||||
| Liveness | `kubectl exec deploy/stellaops-web-ui -- curl -fsS http://localhost:8080/health/live` | Returns `{"status":"Healthy"}`. |
|
||||
| Readiness | `kubectl exec deploy/stellaops-web-ui -- curl -fsS http://localhost:8080/health/ready` | Returns `{"status":"Ready"}`. |
|
||||
| Gateway reachability | `curl -I https://console.example.com/api/console/status` | `200 OK` with CSP headers. |
|
||||
| Static assets | `curl -I https://console.example.com/static/assets/app.js` | `200 OK` with long cache headers. |
|
||||
|
||||
Troubleshooting steps:
|
||||
|
||||
- **Authority unreachable:** readiness fails with `AUTHORITY_UNREACHABLE`. Check DNS, trust bundles, and Authority service health.
|
||||
- **Manifest mismatch:** console logs `DOWNLOAD_MANIFEST_SIGNATURE_INVALID`. Verify cosign key and re-sync manifest.
|
||||
- **Ingress 404:** ensure ingress controller routes host to `stellaops-web-ui` service; check TLS secret name.
|
||||
- **SSE blocked:** confirm proxy allows HTTP/1.1 and disables buffering on `/console/runs/*`.
|
||||
|
||||
---
|
||||
|
||||
## 9. References
|
||||
|
||||
- `deploy/helm/stellaops/values-*.yaml` - environment-specific overrides.
|
||||
- `deploy/compose/docker-compose.console.yaml` - Compose bundle.
|
||||
- `/docs/ui/downloads.md` - manifest and offline bundle guidance.
|
||||
- `/docs/security/console-security.md` (pending) - CSP and Authority scopes.
|
||||
- `/docs/24_OFFLINE_KIT.md` - Offline kit packaging and verification.
|
||||
- `/docs/ops/deployment-runbook.md` (pending) - wider platform deployment steps.
|
||||
|
||||
---
|
||||
|
||||
## 10. Compliance checklist
|
||||
|
||||
- [ ] Helm and Compose instructions verified against `deploy/` assets.
|
||||
- [ ] Ingress/TLS guidance aligns with Security Guild recommendations.
|
||||
- [ ] Environment variables documented with defaults and required values.
|
||||
- [ ] Health/liveness/readiness endpoints tested and listed.
|
||||
- [ ] Offline workflow (mirrors, manifest parity) captured.
|
||||
- [ ] Logging and metrics surface documented metrics.
|
||||
- [ ] CSP and security header defaults stated alongside override guidance.
|
||||
- [ ] Troubleshooting section linked to relevant runbooks.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-27 (Sprint 23).*
|
||||
|
||||
135
docs/deploy/containers.md
Normal file
135
docs/deploy/containers.md
Normal file
@@ -0,0 +1,135 @@
|
||||
# Container Deployment Guide — AOC Update
|
||||
|
||||
> **Audience:** DevOps Guild, platform operators deploying StellaOps services.
|
||||
> **Scope:** Deployment configuration changes required by the Aggregation-Only Contract (AOC), including schema validators, guard environment flags, and verifier identities.
|
||||
|
||||
This guide supplements existing deployment manuals with AOC-specific configuration. It assumes familiarity with the base Compose/Helm manifests described in `ops/deployment/` and `docs/ARCHITECTURE_DEVOPS.md`.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Schema validator enablement
|
||||
|
||||
### 1.1 MongoDB validators
|
||||
|
||||
- Apply JSON schema validators to `advisory_raw` and `vex_raw` collections before enabling AOC guards.
|
||||
- Use the migration script provided in `ops/devops/scripts/apply-aoc-validators.js`:
|
||||
|
||||
```bash
|
||||
kubectl exec -n concelier deploy/concelier-mongo -- \
|
||||
mongo concelier ops/devops/scripts/apply-aoc-validators.js
|
||||
|
||||
kubectl exec -n excititor deploy/excititor-mongo -- \
|
||||
mongo excititor ops/devops/scripts/apply-aoc-validators.js
|
||||
```
|
||||
|
||||
- Validators enforce required fields (`tenant`, `source`, `upstream`, `linkset`) and reject forbidden keys at DB level.
|
||||
- Rollback plan: validators are applied with `validationLevel: moderate`—downgrade via the same script with `--remove` if required.
|
||||
|
||||
### 1.2 Migration order
|
||||
|
||||
1. Deploy validators in maintenance window.
|
||||
2. Roll out Concelier/Excititor images with guard middleware enabled (`AOC_GUARD_ENABLED=true`).
|
||||
3. Run smoke tests (`stella sources ingest --dry-run` fixtures) before resuming production ingestion.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Container environment flags
|
||||
|
||||
Add the following environment variables to Concelier/Excititor deployments:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `AOC_GUARD_ENABLED` | `true` | Enables `AOCWriteGuard` interception. Set `false` only for controlled rollback. |
|
||||
| `AOC_ALLOW_SUPERSEDES_RETROFIT` | `false` | Allows temporary supersedes backfill during migration. Remove after cutover. |
|
||||
| `AOC_METRICS_ENABLED` | `true` | Emits `ingestion_write_total`, `aoc_violation_total`, etc. |
|
||||
| `AOC_TENANT_HEADER` | `X-Stella-Tenant` | Header name expected from Gateway. |
|
||||
| `AOC_VERIFIER_USER` | `stella-aoc-verify` | Read-only service user used by UI/CLI verification. |
|
||||
|
||||
Compose snippet:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- AOC_GUARD_ENABLED=true
|
||||
- AOC_ALLOW_SUPERSEDES_RETROFIT=false
|
||||
- AOC_METRICS_ENABLED=true
|
||||
- AOC_TENANT_HEADER=X-Stella-Tenant
|
||||
- AOC_VERIFIER_USER=stella-aoc-verify
|
||||
```
|
||||
|
||||
Ensure `AOC_VERIFIER_USER` exists in Authority with `aoc:verify` scope and no write permissions.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Verifier identity
|
||||
|
||||
- Create a dedicated client (`stella-aoc-verify`) via Authority bootstrap:
|
||||
|
||||
```yaml
|
||||
clients:
|
||||
- clientId: stella-aoc-verify
|
||||
grantTypes: [client_credentials]
|
||||
scopes: [aoc:verify, advisory:verify, vex:verify]
|
||||
tenants: [default]
|
||||
```
|
||||
|
||||
- Store credentials in secret store (`Kubernetes Secret`, `Docker swarm secret`).
|
||||
- Bind credentials to `stella aoc verify` CI jobs and Console verification service.
|
||||
- Rotate quarterly; document in `ops/authority-key-rotation.md`.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Deployment steps
|
||||
|
||||
1. **Pre-checks:** Confirm database backups, alerting in maintenance mode, and staging environment validated.
|
||||
2. **Apply validators:** Run scripts per § 1.1.
|
||||
3. **Update manifests:** Inject environment variables (§ 2) and mount guard configuration configmaps.
|
||||
4. **Redeploy services:** Rolling restart Concelier/Excititor pods. Monitor `ingestion_write_total` for steady throughput.
|
||||
5. **Seed verifier:** Deploy read-only verifier user and store credentials.
|
||||
6. **Run verification:** Execute `stella aoc verify --since 24h` and ensure exit code `0`.
|
||||
7. **Update dashboards:** Point Grafana panels to new metrics (`aoc_violation_total`).
|
||||
8. **Record handoff:** Capture console screenshots and verification logs for release notes.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Offline Kit updates
|
||||
|
||||
- Ship validator scripts with Offline Kit (`offline-kit/scripts/apply-aoc-validators.js`).
|
||||
- Include pre-generated verification reports for air-gapped deployments.
|
||||
- Document offline CLI workflow in bundle README referencing `docs/cli/cli-reference.md`.
|
||||
- Ensure `stella-aoc-verify` credentials are scoped to offline tenant and rotated during bundle refresh.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Rollback plan
|
||||
|
||||
1. Disable guard via `AOC_GUARD_ENABLED=false` on Concelier/Excititor and rollout.
|
||||
2. Remove validators with the migration script (`--remove`).
|
||||
3. Pause verification jobs to prevent noise.
|
||||
4. Investigate and remediate upstream issues before re-enabling guards.
|
||||
|
||||
---
|
||||
|
||||
## 7 · References
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Authority scopes & tenancy](../security/authority-scopes.md)
|
||||
- [Observability guide](../observability/observability.md)
|
||||
- [CLI AOC commands](../cli/cli-reference.md)
|
||||
- [Concelier architecture](../ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor architecture](../ARCHITECTURE_EXCITITOR.md)
|
||||
|
||||
---
|
||||
|
||||
## 8 · Compliance checklist
|
||||
|
||||
- [ ] Validators documented and scripts referenced for online/offline deployments.
|
||||
- [ ] Environment variables cover guard enablement, metrics, and tenant header.
|
||||
- [ ] Read-only verifier user installation steps included.
|
||||
- [ ] Offline kit instructions align with validator/verification workflow.
|
||||
- [ ] Rollback procedure captured.
|
||||
- [ ] Cross-links to AOC docs, Authority scopes, and observability guides present.
|
||||
- [ ] DevOps Guild sign-off tracked (owner: @devops-guild, due 2025-10-29).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
25
docs/devops/policy-schema-export.md
Normal file
25
docs/devops/policy-schema-export.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Policy Schema Export Automation
|
||||
|
||||
This utility generates JSON Schema documents for the Policy Engine run contracts.
|
||||
|
||||
## Command
|
||||
|
||||
```
|
||||
scripts/export-policy-schemas.sh [output-directory]
|
||||
```
|
||||
|
||||
When no output directory is supplied, schemas are written to `docs/schemas/`.
|
||||
|
||||
The exporter builds against `StellaOps.Scheduler.Models` and emits:
|
||||
|
||||
- `policy-run-request.schema.json`
|
||||
- `policy-run-status.schema.json`
|
||||
- `policy-diff-summary.schema.json`
|
||||
- `policy-explain-trace.schema.json`
|
||||
|
||||
## CI integration checklist
|
||||
|
||||
- [ ] Invoke the script in the DevOps pipeline (see `DEVOPS-POLICY-20-004`).
|
||||
- [ ] Publish the generated schemas as pipeline artifacts.
|
||||
- [ ] Notify downstream consumers when schemas change (Slack `#policy-engine`, changelog snippet).
|
||||
- [ ] Gate CLI validation once schema artifacts are available.
|
||||
@@ -2,28 +2,53 @@
|
||||
|
||||
Platform services publish strongly typed events; the JSON Schemas in this directory define those envelopes. File names follow `<event-name>@<version>.json` so producers and consumers can negotiate contracts explicitly.
|
||||
|
||||
## Catalog
|
||||
- `scanner.report.ready@1.json` — emitted by Scanner.WebService once a signed report is persisted (payload embeds the canonical report plus DSSE envelope). Consumers: Notify, UI timeline.
|
||||
- `scanner.scan.completed@1.json` — emitted alongside the signed report to capture scan outcomes/summary data for downstream automation. Consumers: Notify, Scheduler backfills, UI timelines.
|
||||
- `scheduler.rescan.delta@1.json` — emitted by Scheduler when BOM-Index diffs require fresh scans. Consumers: Notify, Policy Engine.
|
||||
- `attestor.logged@1.json` — emitted by Attestor after storing the Rekor inclusion proof. Consumers: UI attestation panel, Governance exports.
|
||||
## Catalog
|
||||
**Orchestrator envelopes (ORCH-SVC-38-101)**
|
||||
- `scanner.event.report.ready@1.json` — orchestrator event emitted when a signed report is persisted. Supersedes the legacy `scanner.report.ready@1` schema and adds versioning, idempotency keys, and trace context. Consumers: Orchestrator bus, Notifications Studio, UI timeline.
|
||||
- `scanner.event.scan.completed@1.json` — orchestrator event emitted when a scan run finishes. Supersedes the legacy `scanner.scan.completed@1` schema. Consumers: Orchestrator bus, Notifications Studio, Scheduler replay tooling.
|
||||
|
||||
**Legacy envelopes (Redis-backed)**
|
||||
- `scanner.report.ready@1.json` — legacy Redis stream event emitted once a signed report is persisted (kept for transitional compatibility).
|
||||
- `scanner.scan.completed@1.json` — legacy Redis stream event emitted alongside the signed report for automation.
|
||||
- `scheduler.rescan.delta@1.json` — emitted by Scheduler when BOM-Index diffs require fresh scans. Consumers: Notify, Policy Engine.
|
||||
- `scheduler.graph.job.completed@1.json` — emitted when a Cartographer graph build/overlay job finishes (`status = completed|failed|cancelled`). Consumers: Scheduler WebService (lag metrics/API), Cartographer cache warmers, UI overlay freshness indicators.
|
||||
- `attestor.logged@1.json` — emitted by Attestor after storing the Rekor inclusion proof. Consumers: UI attestation panel, Governance exports.
|
||||
|
||||
Additive payload changes (new optional fields) can stay within the same version. Any breaking change (removing a field, tightening validation, altering semantics) must increment the `@<version>` suffix and update downstream consumers.
|
||||
Additive payload changes (new optional fields) can stay within the same version. Any breaking change (removing a field, tightening validation, altering semantics) must increment the `@<version>` suffix and update downstream consumers. For full orchestrator guidance see [`orchestrator-scanner-events.md`](orchestrator-scanner-events.md).
|
||||
|
||||
## Envelope structure
|
||||
All event envelopes share the same deterministic header. Use the following table as the quick reference when emitting or parsing events:
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `eventId` | `uuid` | Must be globally unique per occurrence; producers log duplicates as fatal. |
|
||||
| `kind` | `string` | Fixed per schema (e.g., `scanner.report.ready`). Downstream services reject unknown kinds or versions. |
|
||||
| `tenant` | `string` | Multi‑tenant isolation key; mirror the value recorded in queue/Mongo metadata. |
|
||||
| `ts` | `date-time` | RFC 3339 UTC timestamp. Use monotonic clocks or atomic offsets so ordering survives retries. |
|
||||
| `scope` | `object` | Optional block used when the event concerns a specific image or repository. See schema for required fields (e.g., `repo`, `digest`). |
|
||||
| `payload` | `object` | Event-specific body. Schemas allow additional properties so producers can add optional hints (e.g., `reportId`, `quietedFindingCount`) without breaking consumers. For scanner events, payloads embed both the canonical report document and the DSSE envelope so consumers can reuse signatures without recomputing them. See `docs/runtime/SCANNER_RUNTIME_READINESS.md` for the runtime consumer checklist covering these hints. |
|
||||
## Envelope structure
|
||||
|
||||
### Orchestrator envelope (version 1)
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `eventId` | `uuid` | Globally unique per occurrence. |
|
||||
| `kind` | `string` | e.g., `scanner.event.report.ready`. |
|
||||
| `version` | `integer` | Schema version (`1` for the initial release). |
|
||||
| `tenant` | `string` | Multi‑tenant isolation key; mirror the value recorded in queue/Mongo metadata. |
|
||||
| `occurredAt` | `date-time` | RFC 3339 UTC timestamp describing when the state transition happened. |
|
||||
| `recordedAt` | `date-time` | RFC 3339 UTC timestamp for durable persistence (optional but recommended). |
|
||||
| `source` | `string` | Producer identifier (`scanner.webservice`). |
|
||||
| `idempotencyKey` | `string` | Deterministic dedupe key (`scanner.event.*:<tenant>:<report|scan>`). |
|
||||
| `correlationId` | `string` | Ties the event to the originating scan/API request. |
|
||||
| `traceId` / `spanId` | `string` | W3C trace context propagated into downstream telemetry. |
|
||||
| `scope` | `object` | Optional block with at least `repo` and `digest`. |
|
||||
| `payload` | `object` | Event-specific body; schemas embed the canonical report and DSSE envelope. |
|
||||
| `attributes` | `object` | Optional metadata bag (`string` keys/values) for downstream correlation. |
|
||||
|
||||
For Scanner orchestrator events, `links` include console and API deep links (`ui`, `report`, and `policy`) plus an optional `attestation` URL when a DSSE envelope is present. See [`orchestrator-scanner-events.md`](orchestrator-scanner-events.md) for details.
|
||||
|
||||
### Legacy Redis envelope
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `eventId` | `uuid` | Must be globally unique per occurrence; producers log duplicates as fatal. |
|
||||
| `kind` | `string` | Fixed per schema (e.g., `scanner.report.ready`). Downstream services reject unknown kinds or versions. |
|
||||
| `tenant` | `string` | Multi‑tenant isolation key; mirror the value recorded in queue/Mongo metadata. |
|
||||
| `ts` | `date-time` | RFC 3339 UTC timestamp. Use monotonic clocks or atomic offsets so ordering survives retries. |
|
||||
| `scope` | `object` | Optional block used when the event concerns a specific image or repository. See schema for required fields (e.g., `repo`, `digest`). |
|
||||
| `payload` | `object` | Event-specific body. Schemas allow additional properties so producers can add optional hints (e.g., `reportId`, `quietedFindingCount`) without breaking consumers. See `docs/runtime/SCANNER_RUNTIME_READINESS.md` for the runtime consumer checklist covering these hints. |
|
||||
| `attributes` | `object` | Optional metadata bag (`string` keys/values) for downstream correlation (e.g., pipeline identifiers). Omit when unused to keep payloads concise. |
|
||||
|
||||
When adding new optional fields, document the behaviour in the schema’s `description` block and update the consumer checklist in the next sprint sync.
|
||||
|
||||
When adding new optional fields, document the behaviour in the schema’s `description` block and update the consumer checklist in the next sprint sync.
|
||||
|
||||
## Canonical samples & validation
|
||||
Reference payloads live under `docs/events/samples/`, mirroring the schema version (`<event-name>@<version>.sample.json`). They illustrate common field combinations, including the optional attributes that downstream teams rely on for UI affordances and audit trails. Scanner samples reuse the exact DSSE envelope checked into `samples/api/reports/report-sample.dsse.json`, and unit tests (`ReportSamplesTests`, `PlatformEventSchemaValidationTests`) guard that payloads stay canonical and continue to satisfy the published schemas.
|
||||
|
||||
121
docs/events/orchestrator-scanner-events.md
Normal file
121
docs/events/orchestrator-scanner-events.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Scanner Orchestrator Events (ORCH-SVC-38-101)
|
||||
|
||||
Last updated: 2025-10-26
|
||||
|
||||
The Notifications Studio initiative (NOTIFY-SVC-38-001) and orchestrator backlog (ORCH-SVC-38-101) standardise how platform services emit lifecycle events. This document describes the Scanner WebService contract for the new **orchestrator envelopes** (`scanner.event.*`) and how they supersede the legacy Redis-backed `scanner.report.ready` / `scanner.scan.completed` events.
|
||||
|
||||
## 1. Envelope overview
|
||||
|
||||
Orchestrator events share a deterministic JSON envelope:
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `eventId` | `uuid` | Globally unique identifier generated per occurrence. |
|
||||
| `kind` | `string` | Event identifier; Scanner emits `scanner.event.report.ready` and `scanner.event.scan.completed`. |
|
||||
| `version` | `integer` | Schema version. Initial release uses `1`. |
|
||||
| `tenant` | `string` | Tenant that owns the scan/report. Mirrors Authority claims. |
|
||||
| `occurredAt` | `date-time` | UTC instant when the underlying state transition happened (e.g., report persisted). |
|
||||
| `recordedAt` | `date-time` | UTC instant when the event was durably written. Optional but recommended. |
|
||||
| `source` | `string` | Producer identifier (`scanner.webservice`). |
|
||||
| `idempotencyKey` | `string` | Deterministic key for duplicate suppression (see §4). |
|
||||
| `correlationId` | `string` | Maps back to the API request or scan identifier. |
|
||||
| `traceId` / `spanId` | `string` | W3C trace context propagated into downstream telemetry. |
|
||||
| `scope` | `object` | Describes the affected artefact. Requires `repo` and `digest`; optional `namespace`, `component`, `image`. |
|
||||
| `attributes` | `object` | Flat string map for frequently queried metadata (e.g., policy revision). |
|
||||
| `payload` | `object` | Event-specific body (see §2). |
|
||||
|
||||
Canonical schemas live under `docs/events/scanner.event.*@1.json`. Samples that round-trip through `NotifyCanonicalJsonSerializer` are stored in `docs/events/samples/`.
|
||||
|
||||
## 2. Event kinds and payloads
|
||||
|
||||
### 2.1 `scanner.event.report.ready`
|
||||
|
||||
Emitted once a signed report is persisted and attested. Payload highlights:
|
||||
|
||||
- `reportId` / `scanId` — identifiers for the persisted report and originating scan. Until Scan IDs are surfaced by the API, `scanId` mirrors `reportId` so downstream correlators can stabilise on a single key.
|
||||
- **Attributes:** `reportId`, `policyRevisionId`, `policyDigest`, `verdict` — pre-sorted for deterministic routing.
|
||||
- **Links:**
|
||||
- `ui` → `/ui/reports/{reportId}` on the current host.
|
||||
- `report` → `{apiBasePath}/{reportsSegment}/{reportId}` (defaults to `/api/v1/reports/{reportId}`).
|
||||
- `policy` → `{apiBasePath}/{policySegment}/revisions/{revisionId}` when a revision is present.
|
||||
- `attestation` → `/ui/attestations/{reportId}` when a DSSE envelope is included.
|
||||
- `imageDigest` — OCI image digest associated with the analysis.
|
||||
- `generatedAt` — report generation timestamp (ISO-8601 UTC).
|
||||
- `verdict` — `pass`, `warn`, or `fail` after policy evaluation.
|
||||
- `summary` — blocked/warned/ignored/quieted counters (all non-negative integers).
|
||||
- `delta` — newly critical/high counts and optional `kev` array.
|
||||
- `quietedFindingCount` — mirrors `summary.quieted`.
|
||||
- `policy` — revision metadata (`digest`, `revisionId`) surfaced for routing.
|
||||
- `links` — UI/report/policy URLs suitable for operators.
|
||||
- `dsse` — embedded DSSE envelope (payload, type, signature list).
|
||||
- `report` — canonical report document; identical to the DSSE payload.
|
||||
|
||||
Schema: `docs/events/scanner.event.report.ready@1.json`
|
||||
Sample: `docs/events/samples/scanner.event.report.ready@1.sample.json`
|
||||
|
||||
### 2.2 `scanner.event.scan.completed`
|
||||
|
||||
Emitted after scan execution finishes (success or policy failure). Payload highlights:
|
||||
|
||||
- `reportId` / `scanId` / `imageDigest` — identifiers mirroring the report-ready event. As with the report-ready payload, `scanId` currently mirrors `reportId` as a temporary shim.
|
||||
- **Attributes:** `reportId`, `policyRevisionId`, `policyDigest`, `verdict`.
|
||||
- **Links:** same as above (`ui`, `report`, `policy`) with `attestation` populated when DSSE metadata exists.
|
||||
- `verdict`, `summary`, `delta`, `policy` — same semantics as above.
|
||||
- `findings` — array of surfaced findings with `id`, `severity`, optional `cve`, `purl`, and `reachability`.
|
||||
- `links`, `dsse`, `report` — same structure as §2.1 (allows Notifier to reuse signatures).
|
||||
|
||||
Schema: `docs/events/scanner.event.scan.completed@1.json`
|
||||
Sample: `docs/events/samples/scanner.event.scan.completed@1.sample.json`
|
||||
|
||||
### 2.3 Relationship to legacy events
|
||||
|
||||
| Legacy Redis event | Replacement orchestrator event | Notes |
|
||||
|--------------------|-------------------------------|-------|
|
||||
| `scanner.report.ready` | `scanner.event.report.ready` | Adds versioning, idempotency, trace context. Payload is a superset of the legacy fields. |
|
||||
| `scanner.scan.completed` | `scanner.event.scan.completed` | Same data plus explicit scan identifiers and orchestrator metadata. |
|
||||
|
||||
Legacy schemas remain for backwards-compatibility during migration, but new integrations **must** target the orchestrator variants.
|
||||
|
||||
## 3. Deterministic serialization
|
||||
|
||||
- Producers must serialise events using `NotifyCanonicalJsonSerializer` to guarantee consistent key ordering and whitespace.
|
||||
- Timestamps (`occurredAt`, `recordedAt`, `payload.generatedAt`) use `DateTimeOffset.UtcDateTime.ToString("O")`.
|
||||
- Payload arrays (`delta.kev`, `findings`) should be pre-sorted (e.g., alphabetical CVE order) so hash-based consumers remain stable.
|
||||
- Optional fields are omitted rather than emitted as `null`.
|
||||
|
||||
## 4. Idempotency and correlation
|
||||
|
||||
Idempotency keys dedupe repeated publishes and align with the orchestrator’s outbox pattern:
|
||||
|
||||
| Event kind | Idempotency key template |
|
||||
|------------|-------------------------|
|
||||
| `scanner.event.report.ready` | `scanner.event.report.ready:<tenant>:<reportId>` |
|
||||
| `scanner.event.scan.completed` | `scanner.event.scan.completed:<tenant>:<scanId>` |
|
||||
|
||||
Keys are ASCII lowercase; components should be trimmed and validated before concatenation. Retries must reuse the same key.
|
||||
|
||||
`correlationId` should match the scan identifier that appears in REST responses (`scanId`). Re-using the same value across the pair of events allows Notifier and orchestrator analytics to stitch lifecycle data together.
|
||||
|
||||
## 5. Versioning and evolution
|
||||
|
||||
- Increment the `version` field and the `@<version>` suffix for **breaking** changes (field removals, type changes, semantic shifts).
|
||||
- Additive optional fields may remain within version 1; update the JSON schema and samples accordingly.
|
||||
- When introducing `@2`, keep the `@1` schema/docs in place until orchestrator subscribers confirm migration.
|
||||
|
||||
## 6. Consumer checklist
|
||||
|
||||
1. Validate incoming payloads against the schema for the targeted version.
|
||||
2. Use `idempotencyKey` for dedupe, not `eventId`.
|
||||
3. Map `traceId`/`spanId` into telemetry spans to preserve causality.
|
||||
4. Prefer `payload.report` → `policy.revisionId` when populating templates; the top-level `attributes` are convenience duplicates for quick routing.
|
||||
5. Reserve the legacy Redis events for transitional compatibility only; downstream systems should subscribe to the orchestrator bus exposed by ORCH-SVC-38-101.
|
||||
|
||||
## 7. Implementation status and next actions
|
||||
|
||||
- **Scanner WebService** — `SCANNER-EVENTS-16-301` (blocked) and `SCANNER-EVENTS-16-302` (doing) track the production of these envelopes. The remaining blocker is the .NET 10 preview OpenAPI/Auth dependency drift that currently breaks `dotnet test`. Once Gateway and Notifier owners land the replacement packages, rerun the full test suite and capture fresh fixtures under `docs/events/samples/`.
|
||||
- **Gateway/Notifier consumers** — subscribe to the orchestrator stream documented in ORCH-SVC-38-101. When the Scanner tasks unblock, regenerate notifier contract tests against the sample events included here.
|
||||
- **Docs cadence** — update this file and the matching JSON schemas whenever payload fields change. Use the rehearsal checklist in `docs/ops/launch-cutover.md` to confirm downstream validation before the production cutover. Record gaps or newly required fields in `docs/ops/launch-readiness.md` so they land in the launch checklist.
|
||||
|
||||
---
|
||||
|
||||
**Imposed rule reminder:** work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.
|
||||
93
docs/events/samples/scanner.event.report.ready@1.sample.json
Normal file
93
docs/events/samples/scanner.event.report.ready@1.sample.json
Normal file
@@ -0,0 +1,93 @@
|
||||
{
|
||||
"eventId": "6d2d1b77-f3c3-4f70-8a9d-6f2d0c8801ab",
|
||||
"kind": "scanner.event.report.ready",
|
||||
"version": 1,
|
||||
"tenant": "tenant-alpha",
|
||||
"occurredAt": "2025-10-19T12:34:56Z",
|
||||
"recordedAt": "2025-10-19T12:34:57Z",
|
||||
"source": "scanner.webservice",
|
||||
"idempotencyKey": "scanner.event.report.ready:tenant-alpha:report-abc",
|
||||
"correlationId": "report-abc",
|
||||
"traceId": "0af7651916cd43dd8448eb211c80319c",
|
||||
"spanId": "b7ad6b7169203331",
|
||||
"scope": {
|
||||
"namespace": "acme/edge",
|
||||
"repo": "api",
|
||||
"digest": "sha256:feedface"
|
||||
},
|
||||
"attributes": {
|
||||
"reportId": "report-abc",
|
||||
"policyRevisionId": "rev-42",
|
||||
"policyDigest": "digest-123",
|
||||
"verdict": "blocked"
|
||||
},
|
||||
"payload": {
|
||||
"reportId": "report-abc",
|
||||
"scanId": "report-abc",
|
||||
"imageDigest": "sha256:feedface",
|
||||
"generatedAt": "2025-10-19T12:34:56Z",
|
||||
"verdict": "fail",
|
||||
"summary": {
|
||||
"total": 1,
|
||||
"blocked": 1,
|
||||
"warned": 0,
|
||||
"ignored": 0,
|
||||
"quieted": 0
|
||||
},
|
||||
"delta": {
|
||||
"newCritical": 1,
|
||||
"kev": [
|
||||
"CVE-2024-9999"
|
||||
]
|
||||
},
|
||||
"quietedFindingCount": 0,
|
||||
"policy": {
|
||||
"digest": "digest-123",
|
||||
"revisionId": "rev-42"
|
||||
},
|
||||
"links": {
|
||||
"ui": "https://scanner.example/ui/reports/report-abc",
|
||||
"report": "https://scanner.example/api/v1/reports/report-abc",
|
||||
"policy": "https://scanner.example/api/v1/policy/revisions/rev-42",
|
||||
"attestation": "https://scanner.example/ui/attestations/report-abc"
|
||||
},
|
||||
"dsse": {
|
||||
"payloadType": "application/vnd.stellaops.report+json",
|
||||
"payload": "eyJyZXBvcnRJZCI6InJlcG9ydC1hYmMiLCJpbWFnZURpZ2VzdCI6InNoYTI1NjpmZWVkZmFjZSIsImdlbmVyYXRlZEF0IjoiMjAyNS0xMC0xOVQxMjozNDo1NiswMDowMCIsInZlcmRpY3QiOiJibG9ja2VkIiwicG9saWN5Ijp7InJldmlzaW9uSWQiOiJyZXYtNDIiLCJkaWdlc3QiOiJkaWdlc3QtMTIzIn0sInN1bW1hcnkiOnsidG90YWwiOjEsImJsb2NrZWQiOjEsIndhcm5lZCI6MCwiaWdub3JlZCI6MCwicXVpZXRlZCI6MH0sInZlcmRpY3RzIjpbeyJmaW5kaW5nSWQiOiJmaW5kaW5nLTEiLCJzdGF0dXMiOiJCbG9ja2VkIiwic2NvcmUiOjQ3LjUsInNvdXJjZVRydXN0IjoiTlZEIiwicmVhY2hhYmlsaXR5IjoicnVudGltZSJ9XSwiaXNzdWVzIjpbXX0=",
|
||||
"signatures": [
|
||||
{
|
||||
"keyId": "test-key",
|
||||
"algorithm": "hs256",
|
||||
"signature": "signature-value"
|
||||
}
|
||||
]
|
||||
},
|
||||
"report": {
|
||||
"reportId": "report-abc",
|
||||
"generatedAt": "2025-10-19T12:34:56Z",
|
||||
"imageDigest": "sha256:feedface",
|
||||
"policy": {
|
||||
"digest": "digest-123",
|
||||
"revisionId": "rev-42"
|
||||
},
|
||||
"summary": {
|
||||
"total": 1,
|
||||
"blocked": 1,
|
||||
"warned": 0,
|
||||
"ignored": 0,
|
||||
"quieted": 0
|
||||
},
|
||||
"verdict": "blocked",
|
||||
"verdicts": [
|
||||
{
|
||||
"findingId": "finding-1",
|
||||
"status": "Blocked",
|
||||
"score": 47.5,
|
||||
"sourceTrust": "NVD",
|
||||
"reachability": "runtime"
|
||||
}
|
||||
],
|
||||
"issues": []
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,99 @@
|
||||
{
|
||||
"eventId": "08a6de24-4a94-4d14-8432-9d14f36f6da3",
|
||||
"kind": "scanner.event.scan.completed",
|
||||
"version": 1,
|
||||
"tenant": "tenant-alpha",
|
||||
"occurredAt": "2025-10-19T12:34:56Z",
|
||||
"recordedAt": "2025-10-19T12:34:57Z",
|
||||
"source": "scanner.webservice",
|
||||
"idempotencyKey": "scanner.event.scan.completed:tenant-alpha:report-abc",
|
||||
"correlationId": "report-abc",
|
||||
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
|
||||
"scope": {
|
||||
"namespace": "acme/edge",
|
||||
"repo": "api",
|
||||
"digest": "sha256:feedface"
|
||||
},
|
||||
"attributes": {
|
||||
"reportId": "report-abc",
|
||||
"policyRevisionId": "rev-42",
|
||||
"policyDigest": "digest-123",
|
||||
"verdict": "blocked"
|
||||
},
|
||||
"payload": {
|
||||
"reportId": "report-abc",
|
||||
"scanId": "report-abc",
|
||||
"imageDigest": "sha256:feedface",
|
||||
"verdict": "fail",
|
||||
"summary": {
|
||||
"total": 1,
|
||||
"blocked": 1,
|
||||
"warned": 0,
|
||||
"ignored": 0,
|
||||
"quieted": 0
|
||||
},
|
||||
"delta": {
|
||||
"newCritical": 1,
|
||||
"kev": [
|
||||
"CVE-2024-9999"
|
||||
]
|
||||
},
|
||||
"policy": {
|
||||
"digest": "digest-123",
|
||||
"revisionId": "rev-42"
|
||||
},
|
||||
"findings": [
|
||||
{
|
||||
"id": "finding-1",
|
||||
"severity": "Critical",
|
||||
"cve": "CVE-2024-9999",
|
||||
"purl": "pkg:docker/acme/edge-api@sha256-feedface",
|
||||
"reachability": "runtime"
|
||||
}
|
||||
],
|
||||
"links": {
|
||||
"ui": "https://scanner.example/ui/reports/report-abc",
|
||||
"report": "https://scanner.example/api/v1/reports/report-abc",
|
||||
"policy": "https://scanner.example/api/v1/policy/revisions/rev-42",
|
||||
"attestation": "https://scanner.example/ui/attestations/report-abc"
|
||||
},
|
||||
"dsse": {
|
||||
"payloadType": "application/vnd.stellaops.report+json",
|
||||
"payload": "eyJyZXBvcnRJZCI6InJlcG9ydC1hYmMiLCJpbWFnZURpZ2VzdCI6InNoYTI1NjpmZWVkZmFjZSIsImdlbmVyYXRlZEF0IjoiMjAyNS0xMC0xOVQxMjozNDo1NiswMDowMCIsInZlcmRpY3QiOiJibG9ja2VkIiwicG9saWN5Ijp7InJldmlzaW9uSWQiOiJyZXYtNDIiLCJkaWdlc3QiOiJkaWdlc3QtMTIzIn0sInN1bW1hcnkiOnsidG90YWwiOjEsImJsb2NrZWQiOjEsIndhcm5lZCI6MCwiaWdub3JlZCI6MCwicXVpZXRlZCI6MH0sInZlcmRpY3RzIjpbeyJmaW5kaW5nSWQiOiJmaW5kaW5nLTEiLCJzdGF0dXMiOiJCbG9ja2VkIiwic2NvcmUiOjQ3LjUsInNvdXJjZVRydXN0IjoiTlZEIiwicmVhY2hhYmlsaXR5IjoicnVudGltZSJ9XSwiaXNzdWVzIjpbXX0=",
|
||||
"signatures": [
|
||||
{
|
||||
"keyId": "test-key",
|
||||
"algorithm": "hs256",
|
||||
"signature": "signature-value"
|
||||
}
|
||||
]
|
||||
},
|
||||
"report": {
|
||||
"reportId": "report-abc",
|
||||
"generatedAt": "2025-10-19T12:34:56Z",
|
||||
"imageDigest": "sha256:feedface",
|
||||
"policy": {
|
||||
"digest": "digest-123",
|
||||
"revisionId": "rev-42"
|
||||
},
|
||||
"summary": {
|
||||
"total": 1,
|
||||
"blocked": 1,
|
||||
"warned": 0,
|
||||
"ignored": 0,
|
||||
"quieted": 0
|
||||
},
|
||||
"verdict": "blocked",
|
||||
"verdicts": [
|
||||
{
|
||||
"findingId": "finding-1",
|
||||
"status": "Blocked",
|
||||
"score": 47.5,
|
||||
"sourceTrust": "NVD",
|
||||
"reachability": "runtime"
|
||||
}
|
||||
],
|
||||
"issues": []
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,36 @@
|
||||
{
|
||||
"eventId": "4d33c19c-1c8a-44d1-9954-1d5e98b2af71",
|
||||
"kind": "scheduler.graph.job.completed",
|
||||
"tenant": "tenant-alpha",
|
||||
"ts": "2025-10-26T12:00:45Z",
|
||||
"payload": {
|
||||
"jobType": "build",
|
||||
"status": "completed",
|
||||
"occurredAt": "2025-10-26T12:00:45Z",
|
||||
"job": {
|
||||
"schemaVersion": "scheduler.graph-build-job@1",
|
||||
"id": "gbj_20251026a",
|
||||
"tenantId": "tenant-alpha",
|
||||
"sbomId": "sbom_20251026",
|
||||
"sbomVersionId": "sbom_ver_20251026",
|
||||
"sbomDigest": "sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef",
|
||||
"graphSnapshotId": "graph_snap_20251026",
|
||||
"status": "completed",
|
||||
"trigger": "sbom-version",
|
||||
"attempts": 1,
|
||||
"cartographerJobId": "carto_job_42",
|
||||
"correlationId": "evt_svc_987",
|
||||
"createdAt": "2025-10-26T12:00:00+00:00",
|
||||
"startedAt": "2025-10-26T12:00:05+00:00",
|
||||
"completedAt": "2025-10-26T12:00:45+00:00",
|
||||
"metadata": {
|
||||
"sbomEventId": "sbom_evt_20251026"
|
||||
}
|
||||
},
|
||||
"resultUri": "oras://cartographer/offline/tenant-alpha/graph_snap_20251026"
|
||||
},
|
||||
"attributes": {
|
||||
"cartographerCluster": "offline-kit",
|
||||
"plannerShard": "graph-builders-01"
|
||||
}
|
||||
}
|
||||
164
docs/events/scanner.event.report.ready@1.json
Normal file
164
docs/events/scanner.event.report.ready@1.json
Normal file
@@ -0,0 +1,164 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/events/scanner.event.report.ready@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Scanner orchestrator event – report ready (v1)",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": [
|
||||
"eventId",
|
||||
"kind",
|
||||
"version",
|
||||
"tenant",
|
||||
"occurredAt",
|
||||
"source",
|
||||
"idempotencyKey",
|
||||
"payload"
|
||||
],
|
||||
"properties": {
|
||||
"eventId": {
|
||||
"type": "string",
|
||||
"format": "uuid",
|
||||
"description": "Globally unique identifier for this occurrence."
|
||||
},
|
||||
"kind": {
|
||||
"const": "scanner.event.report.ready",
|
||||
"description": "Event kind identifier consumed by orchestrator subscribers."
|
||||
},
|
||||
"version": {
|
||||
"const": 1,
|
||||
"description": "Schema version for orchestrator envelopes."
|
||||
},
|
||||
"tenant": {
|
||||
"type": "string",
|
||||
"description": "Tenant that owns the scan/report."
|
||||
},
|
||||
"occurredAt": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Timestamp (UTC) when the report transitioned to ready."
|
||||
},
|
||||
"recordedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Timestamp (UTC) when the event was persisted. Optional."
|
||||
},
|
||||
"source": {
|
||||
"type": "string",
|
||||
"description": "Producer identifier, e.g. `scanner.webservice`."
|
||||
},
|
||||
"idempotencyKey": {
|
||||
"type": "string",
|
||||
"minLength": 8,
|
||||
"description": "Deterministic key used to deduplicate events downstream."
|
||||
},
|
||||
"correlationId": {
|
||||
"type": "string",
|
||||
"description": "Correlation identifier that ties this event to a request or workflow."
|
||||
},
|
||||
"traceId": {
|
||||
"type": "string",
|
||||
"description": "W3C trace ID (32 hex chars) for distributed tracing."
|
||||
},
|
||||
"spanId": {
|
||||
"type": "string",
|
||||
"description": "Optional span identifier associated with traceId."
|
||||
},
|
||||
"scope": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["repo", "digest"],
|
||||
"properties": {
|
||||
"namespace": {"type": "string"},
|
||||
"repo": {"type": "string"},
|
||||
"digest": {"type": "string"},
|
||||
"component": {"type": "string"},
|
||||
"image": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"attributes": {
|
||||
"type": "object",
|
||||
"description": "String attributes for downstream correlation (policy revision, scan id, etc.).",
|
||||
"additionalProperties": {"type": "string"}
|
||||
},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"additionalProperties": true,
|
||||
"required": ["reportId", "verdict", "summary", "links", "report"],
|
||||
"properties": {
|
||||
"reportId": {"type": "string"},
|
||||
"scanId": {"type": "string"},
|
||||
"imageDigest": {"type": "string"},
|
||||
"generatedAt": {"type": "string", "format": "date-time"},
|
||||
"verdict": {"enum": ["pass", "warn", "fail"]},
|
||||
"summary": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["total", "blocked", "warned", "ignored", "quieted"],
|
||||
"properties": {
|
||||
"total": {"type": "integer", "minimum": 0},
|
||||
"blocked": {"type": "integer", "minimum": 0},
|
||||
"warned": {"type": "integer", "minimum": 0},
|
||||
"ignored": {"type": "integer", "minimum": 0},
|
||||
"quieted": {"type": "integer", "minimum": 0}
|
||||
}
|
||||
},
|
||||
"delta": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"newCritical": {"type": "integer", "minimum": 0},
|
||||
"newHigh": {"type": "integer", "minimum": 0},
|
||||
"kev": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"}
|
||||
}
|
||||
}
|
||||
},
|
||||
"quietedFindingCount": {
|
||||
"type": "integer",
|
||||
"minimum": 0
|
||||
},
|
||||
"policy": {
|
||||
"type": "object",
|
||||
"description": "Policy revision metadata surfaced alongside the report."
|
||||
},
|
||||
"links": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"ui": {"type": "string", "format": "uri"},
|
||||
"report": {"type": "string", "format": "uri"},
|
||||
"policy": {"type": "string", "format": "uri"},
|
||||
"attestation": {"type": "string", "format": "uri"}
|
||||
}
|
||||
},
|
||||
"dsse": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["payloadType", "payload", "signatures"],
|
||||
"properties": {
|
||||
"payloadType": {"type": "string"},
|
||||
"payload": {"type": "string"},
|
||||
"signatures": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["keyId", "algorithm", "signature"],
|
||||
"properties": {
|
||||
"keyId": {"type": "string"},
|
||||
"algorithm": {"type": "string"},
|
||||
"signature": {"type": "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"report": {
|
||||
"type": "object",
|
||||
"description": "Canonical scanner report document that aligns with the DSSE payload."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
174
docs/events/scanner.event.scan.completed@1.json
Normal file
174
docs/events/scanner.event.scan.completed@1.json
Normal file
@@ -0,0 +1,174 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/events/scanner.event.scan.completed@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Scanner orchestrator event – scan completed (v1)",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": [
|
||||
"eventId",
|
||||
"kind",
|
||||
"version",
|
||||
"tenant",
|
||||
"occurredAt",
|
||||
"source",
|
||||
"idempotencyKey",
|
||||
"payload"
|
||||
],
|
||||
"properties": {
|
||||
"eventId": {
|
||||
"type": "string",
|
||||
"format": "uuid",
|
||||
"description": "Globally unique identifier for this occurrence."
|
||||
},
|
||||
"kind": {
|
||||
"const": "scanner.event.scan.completed",
|
||||
"description": "Event kind identifier consumed by orchestrator subscribers."
|
||||
},
|
||||
"version": {
|
||||
"const": 1,
|
||||
"description": "Schema version for orchestrator envelopes."
|
||||
},
|
||||
"tenant": {
|
||||
"type": "string",
|
||||
"description": "Tenant that owns the scan."
|
||||
},
|
||||
"occurredAt": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Timestamp (UTC) when the scan completed."
|
||||
},
|
||||
"recordedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "Timestamp (UTC) when the event was persisted. Optional."
|
||||
},
|
||||
"source": {
|
||||
"type": "string",
|
||||
"description": "Producer identifier, e.g. `scanner.webservice`."
|
||||
},
|
||||
"idempotencyKey": {
|
||||
"type": "string",
|
||||
"minLength": 8,
|
||||
"description": "Deterministic key used to deduplicate events downstream."
|
||||
},
|
||||
"correlationId": {
|
||||
"type": "string",
|
||||
"description": "Correlation identifier tying this event to a request or workflow."
|
||||
},
|
||||
"traceId": {
|
||||
"type": "string",
|
||||
"description": "W3C trace ID (32 hex chars) for distributed tracing."
|
||||
},
|
||||
"spanId": {
|
||||
"type": "string",
|
||||
"description": "Optional span identifier associated with traceId."
|
||||
},
|
||||
"scope": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["repo", "digest"],
|
||||
"properties": {
|
||||
"namespace": {"type": "string"},
|
||||
"repo": {"type": "string"},
|
||||
"digest": {"type": "string"},
|
||||
"component": {"type": "string"},
|
||||
"image": {"type": "string"}
|
||||
}
|
||||
},
|
||||
"attributes": {
|
||||
"type": "object",
|
||||
"description": "String attributes for downstream correlation (policy revision, scan id, etc.).",
|
||||
"additionalProperties": {"type": "string"}
|
||||
},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"additionalProperties": true,
|
||||
"required": ["reportId", "scanId", "imageDigest", "verdict", "summary", "report"],
|
||||
"properties": {
|
||||
"reportId": {"type": "string"},
|
||||
"scanId": {"type": "string"},
|
||||
"imageDigest": {"type": "string"},
|
||||
"verdict": {"enum": ["pass", "warn", "fail"]},
|
||||
"summary": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["total", "blocked", "warned", "ignored", "quieted"],
|
||||
"properties": {
|
||||
"total": {"type": "integer", "minimum": 0},
|
||||
"blocked": {"type": "integer", "minimum": 0},
|
||||
"warned": {"type": "integer", "minimum": 0},
|
||||
"ignored": {"type": "integer", "minimum": 0},
|
||||
"quieted": {"type": "integer", "minimum": 0}
|
||||
}
|
||||
},
|
||||
"delta": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"newCritical": {"type": "integer", "minimum": 0},
|
||||
"newHigh": {"type": "integer", "minimum": 0},
|
||||
"kev": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"}
|
||||
}
|
||||
}
|
||||
},
|
||||
"policy": {
|
||||
"type": "object",
|
||||
"description": "Policy revision metadata surfaced alongside the report."
|
||||
},
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["id"],
|
||||
"properties": {
|
||||
"id": {"type": "string"},
|
||||
"severity": {"type": "string"},
|
||||
"cve": {"type": "string"},
|
||||
"purl": {"type": "string"},
|
||||
"reachability": {"type": "string"}
|
||||
}
|
||||
}
|
||||
},
|
||||
"links": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"ui": {"type": "string", "format": "uri"},
|
||||
"report": {"type": "string", "format": "uri"},
|
||||
"policy": {"type": "string", "format": "uri"},
|
||||
"attestation": {"type": "string", "format": "uri"}
|
||||
}
|
||||
},
|
||||
"dsse": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["payloadType", "payload", "signatures"],
|
||||
"properties": {
|
||||
"payloadType": {"type": "string"},
|
||||
"payload": {"type": "string"},
|
||||
"signatures": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["keyId", "algorithm", "signature"],
|
||||
"properties": {
|
||||
"keyId": {"type": "string"},
|
||||
"algorithm": {"type": "string"},
|
||||
"signature": {"type": "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"report": {
|
||||
"type": "object",
|
||||
"description": "Canonical scanner report document that aligns with the DSSE payload."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
196
docs/events/scheduler.graph.job.completed@1.json
Normal file
196
docs/events/scheduler.graph.job.completed@1.json
Normal file
@@ -0,0 +1,196 @@
|
||||
{
|
||||
"$id": "https://stella-ops.org/schemas/events/scheduler.graph.job.completed@1.json",
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Scheduler Graph Job Completed Event",
|
||||
"description": "Legacy scheduler event emitted when a graph build or overlay job reaches a terminal state. Consumers validate downstream caches and surface overlay freshness.",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["eventId", "kind", "tenant", "ts", "payload"],
|
||||
"properties": {
|
||||
"eventId": {
|
||||
"type": "string",
|
||||
"format": "uuid",
|
||||
"description": "Globally unique identifier per event."
|
||||
},
|
||||
"kind": {
|
||||
"const": "scheduler.graph.job.completed"
|
||||
},
|
||||
"tenant": {
|
||||
"type": "string",
|
||||
"description": "Tenant identifier scoped to the originating job."
|
||||
},
|
||||
"ts": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "UTC timestamp when the job reached a terminal state."
|
||||
},
|
||||
"payload": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": ["jobType", "job", "status", "occurredAt"],
|
||||
"properties": {
|
||||
"jobType": {
|
||||
"type": "string",
|
||||
"enum": ["build", "overlay"],
|
||||
"description": "Job flavour, matches the CLR type of the serialized job payload."
|
||||
},
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["completed", "failed", "cancelled"],
|
||||
"description": "Terminal status recorded for the job."
|
||||
},
|
||||
"occurredAt": {
|
||||
"type": "string",
|
||||
"format": "date-time",
|
||||
"description": "UTC timestamp of the terminal transition, mirrors job.CompletedAt."
|
||||
},
|
||||
"job": {
|
||||
"oneOf": [
|
||||
{"$ref": "#/definitions/graphBuildJob"},
|
||||
{"$ref": "#/definitions/graphOverlayJob"}
|
||||
],
|
||||
"description": "Canonical serialized representation of the finished job."
|
||||
},
|
||||
"resultUri": {
|
||||
"type": "string",
|
||||
"description": "Optional URI pointing to Cartographer snapshot or overlay bundle (if available)."
|
||||
}
|
||||
}
|
||||
},
|
||||
"attributes": {
|
||||
"type": "object",
|
||||
"description": "Optional correlation bag for downstream consumers.",
|
||||
"additionalProperties": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
},
|
||||
"definitions": {
|
||||
"graphBuildJob": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": [
|
||||
"schemaVersion",
|
||||
"id",
|
||||
"tenantId",
|
||||
"sbomId",
|
||||
"sbomVersionId",
|
||||
"sbomDigest",
|
||||
"status",
|
||||
"trigger",
|
||||
"attempts",
|
||||
"createdAt"
|
||||
],
|
||||
"properties": {
|
||||
"schemaVersion": {
|
||||
"const": "scheduler.graph-build-job@1"
|
||||
},
|
||||
"id": {"type": "string"},
|
||||
"tenantId": {"type": "string"},
|
||||
"sbomId": {"type": "string"},
|
||||
"sbomVersionId": {"type": "string"},
|
||||
"sbomDigest": {
|
||||
"type": "string",
|
||||
"pattern": "^sha256:[a-f0-9]{64}$"
|
||||
},
|
||||
"graphSnapshotId": {"type": "string"},
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["pending", "queued", "running", "completed", "failed", "cancelled"]
|
||||
},
|
||||
"trigger": {
|
||||
"type": "string",
|
||||
"enum": ["sbom-version", "backfill", "manual"]
|
||||
},
|
||||
"attempts": {
|
||||
"type": "integer",
|
||||
"minimum": 0
|
||||
},
|
||||
"cartographerJobId": {"type": "string"},
|
||||
"correlationId": {"type": "string"},
|
||||
"createdAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"startedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"completedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"error": {"type": "string"},
|
||||
"metadata": {
|
||||
"type": "object",
|
||||
"additionalProperties": {"type": "string"}
|
||||
}
|
||||
}
|
||||
},
|
||||
"graphOverlayJob": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"required": [
|
||||
"schemaVersion",
|
||||
"id",
|
||||
"tenantId",
|
||||
"graphSnapshotId",
|
||||
"overlayKind",
|
||||
"overlayKey",
|
||||
"status",
|
||||
"trigger",
|
||||
"attempts",
|
||||
"createdAt"
|
||||
],
|
||||
"properties": {
|
||||
"schemaVersion": {
|
||||
"const": "scheduler.graph-overlay-job@1"
|
||||
},
|
||||
"id": {"type": "string"},
|
||||
"tenantId": {"type": "string"},
|
||||
"graphSnapshotId": {"type": "string"},
|
||||
"buildJobId": {"type": "string"},
|
||||
"overlayKind": {
|
||||
"type": "string",
|
||||
"enum": ["policy", "advisory", "vex"]
|
||||
},
|
||||
"overlayKey": {"type": "string"},
|
||||
"subjects": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
"uniqueItems": true
|
||||
},
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["pending", "queued", "running", "completed", "failed", "cancelled"]
|
||||
},
|
||||
"trigger": {
|
||||
"type": "string",
|
||||
"enum": ["policy", "advisory", "vex", "sbom-version", "manual"]
|
||||
},
|
||||
"attempts": {
|
||||
"type": "integer",
|
||||
"minimum": 0
|
||||
},
|
||||
"correlationId": {"type": "string"},
|
||||
"createdAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"startedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"completedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"error": {"type": "string"},
|
||||
"metadata": {
|
||||
"type": "object",
|
||||
"additionalProperties": {"type": "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
16
docs/examples/policies/README.md
Normal file
16
docs/examples/policies/README.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Policy Examples
|
||||
|
||||
Sample `stella-dsl@1` policies illustrating common deployment personas. Each example includes commentary, CLI usage hints, and a compliance checklist.
|
||||
|
||||
| Example | Description |
|
||||
|---------|-------------|
|
||||
| [Baseline](baseline.md) | Balanced production defaults (block critical, respect strong VEX). |
|
||||
| [Serverless](serverless.md) | Aggressive blocking for serverless workloads (no High+, pinned base images). |
|
||||
| [Internal Only](internal-only.md) | Lenient policy for internal/dev environments with KEV safeguards. |
|
||||
|
||||
Policy source files (`*.stella`) live alongside the documentation so you can copy/paste or use `stella policy new --from file://...`.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26.*
|
||||
|
||||
79
docs/examples/policies/baseline.md
Normal file
79
docs/examples/policies/baseline.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Baseline Policy Example (`baseline.stella`)
|
||||
|
||||
This sample policy provides a balanced default for production workloads: block critical findings, require strong VEX justifications to suppress advisories, and warn on deprecated runtimes. Use it as a starting point for tenants that want guardrails without excessive noise.
|
||||
|
||||
```dsl
|
||||
policy "Baseline Production Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Block critical, escalate high, enforce VEX justifications."
|
||||
tags = ["baseline","production"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
map vendor_weight {
|
||||
source "GHSA" => +0.5
|
||||
source "OSV" => +0.0
|
||||
source "VendorX" => -0.2
|
||||
}
|
||||
env exposure_adjustments {
|
||||
if env.exposure == "internet" then +0.5
|
||||
if env.runtime == "legacy" then +0.3
|
||||
}
|
||||
}
|
||||
|
||||
rule block_critical priority 5 {
|
||||
when severity.normalized >= "Critical"
|
||||
then status := "blocked"
|
||||
because "Critical severity must be remediated before deploy."
|
||||
}
|
||||
|
||||
rule escalate_high_internet {
|
||||
when severity.normalized == "High"
|
||||
and env.exposure == "internet"
|
||||
then escalate to severity_band("Critical")
|
||||
because "High severity on internet-exposed asset escalates to critical."
|
||||
}
|
||||
|
||||
rule require_vex_justification {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
|
||||
then status := vex.status
|
||||
annotate winning_statement := vex.latest().statementId
|
||||
because "Respect strong vendor VEX claims."
|
||||
}
|
||||
|
||||
rule alert_warn_eol_runtime priority 1 {
|
||||
when severity.normalized <= "Medium"
|
||||
and sbom.has_tag("runtime:eol")
|
||||
then warn message "Runtime marked as EOL; upgrade recommended."
|
||||
because "Deprecated runtime should be upgraded."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Commentary
|
||||
|
||||
- **Severity profile** tightens vendor weights and applies exposure modifiers so internet-facing/high severity pairs escalate automatically.
|
||||
- **VEX rule** only honours strong justifications, preventing weaker claims from hiding issues.
|
||||
- **Warnings first** – The `alert_warn_eol_runtime` rule name ensures it sorts before the require-VEX rule, keeping alerts visible without flipping to `RequiresVex`.
|
||||
- Works well as shared `tenant-global` baseline; use tenant overrides for stricter tolerant environments.
|
||||
|
||||
## Try it out
|
||||
|
||||
```bash
|
||||
stella policy new --policy-id P-baseline --template blank --open
|
||||
stella policy lint examples/policies/baseline.stella
|
||||
stella policy simulate P-baseline --candidate 1 --sbom sbom:sample-prod
|
||||
```
|
||||
|
||||
## Compliance checklist
|
||||
|
||||
- [ ] Policy compiled via `stella policy lint` without diagnostics.
|
||||
- [ ] Simulation diff reviewed against golden SBOM set.
|
||||
- [ ] Approval note documents rationale before promoting to production.
|
||||
- [ ] EOL runtime tags kept up to date in SBOM metadata.
|
||||
- [ ] VEX vendor allow-list reviewed quarterly.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26.*
|
||||
46
docs/examples/policies/baseline.stella
Normal file
46
docs/examples/policies/baseline.stella
Normal file
@@ -0,0 +1,46 @@
|
||||
policy "Baseline Production Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Block critical, escalate high, enforce VEX justifications."
|
||||
tags = ["baseline","production"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
map vendor_weight {
|
||||
source "GHSA" => +0.5
|
||||
source "OSV" => +0.0
|
||||
source "VendorX" => -0.2
|
||||
}
|
||||
env exposure_adjustments {
|
||||
if env.exposure == "internet" then +0.5
|
||||
if env.runtime == "legacy" then +0.3
|
||||
}
|
||||
}
|
||||
|
||||
rule block_critical priority 5 {
|
||||
when severity.normalized >= "Critical"
|
||||
then status := "blocked"
|
||||
because "Critical severity must be remediated before deploy."
|
||||
}
|
||||
|
||||
rule escalate_high_internet {
|
||||
when severity.normalized == "High"
|
||||
and env.exposure == "internet"
|
||||
then escalate to severity_band("Critical")
|
||||
because "High severity on internet-exposed asset escalates to critical."
|
||||
}
|
||||
|
||||
rule require_vex_justification {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
|
||||
then status := vex.status
|
||||
annotate winning_statement := vex.latest().statementId
|
||||
because "Respect strong vendor VEX claims."
|
||||
}
|
||||
|
||||
rule alert_warn_eol_runtime priority 1 {
|
||||
when severity.normalized <= "Medium"
|
||||
and sbom.has_tag("runtime:eol")
|
||||
then warn message "Runtime marked as EOL; upgrade recommended."
|
||||
because "Deprecated runtime should be upgraded."
|
||||
}
|
||||
}
|
||||
34
docs/examples/policies/baseline.yaml
Normal file
34
docs/examples/policies/baseline.yaml
Normal file
@@ -0,0 +1,34 @@
|
||||
version: "1.0"
|
||||
metadata:
|
||||
description: Baseline production policy
|
||||
tags:
|
||||
- baseline
|
||||
- production
|
||||
rules:
|
||||
- name: Block Critical
|
||||
severity: [Critical]
|
||||
action: block
|
||||
|
||||
- name: Escalate High Internet
|
||||
severity: [High]
|
||||
environments: [internet]
|
||||
action:
|
||||
type: escalate
|
||||
escalate:
|
||||
minimumSeverity: Critical
|
||||
|
||||
- name: Require VEX justification
|
||||
sources: [NVD, GHSA]
|
||||
action:
|
||||
type: requireVex
|
||||
requireVex:
|
||||
vendors: [VendorX, VendorY]
|
||||
justifications:
|
||||
- component_not_present
|
||||
- vulnerable_code_not_present
|
||||
|
||||
- name: Alert warn EOL runtime
|
||||
priority: 1
|
||||
severity: [Low, Medium]
|
||||
tags: [runtime:eol]
|
||||
action: warn
|
||||
72
docs/examples/policies/internal-only.md
Normal file
72
docs/examples/policies/internal-only.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Internal-Only Policy Example (`internal-only.stella`)
|
||||
|
||||
A relaxed profile for internal services and development environments: allow Medium severities with warnings, rely on VEX more heavily, but still block KEV/actively exploited advisories.
|
||||
|
||||
```dsl
|
||||
policy "Internal Only Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Lenient policy for internal / dev tenants."
|
||||
tags = ["internal","dev"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
env exposure_adjustments {
|
||||
if env.exposure == "internal" then -0.4
|
||||
if env.stage == "dev" then -0.6
|
||||
}
|
||||
}
|
||||
|
||||
rule block_kev priority 1 {
|
||||
when advisory.has_tag("kev")
|
||||
then status := "blocked"
|
||||
because "Known exploited vulnerabilities must be remediated."
|
||||
}
|
||||
|
||||
rule allow_medium_with_warning {
|
||||
when severity.normalized == "Medium"
|
||||
and env.exposure == "internal"
|
||||
then warn message "Medium severity permitted in internal environments."
|
||||
because "Allow Medium findings with warning for internal workloads."
|
||||
}
|
||||
|
||||
rule accept_vendor_vex {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
then status := vex.status
|
||||
annotate justification := vex.latest().justification
|
||||
because "Trust vendor VEX statements for internal scope."
|
||||
}
|
||||
|
||||
rule quiet_low_priority {
|
||||
when severity.normalized <= "Low"
|
||||
then ignore until "2026-01-01T00:00:00Z"
|
||||
because "Quiet low severity until next annual remediation sweep."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Commentary
|
||||
|
||||
- Suitable for staging/dev tenants with lower blast radius.
|
||||
- KEV advisories override lenient behaviour to maintain minimum security bar.
|
||||
- Warnings ensure Medium findings stay visible in dashboards and CLI outputs.
|
||||
- Quiet rule enforces planned clean-up date; update before expiry.
|
||||
|
||||
## Try it out
|
||||
|
||||
```bash
|
||||
stella policy lint examples/policies/internal-only.stella
|
||||
stella policy simulate P-internal --candidate 1 \
|
||||
--sbom sbom:internal-service --env exposure=internal --env stage=dev
|
||||
```
|
||||
|
||||
## Compliance checklist
|
||||
|
||||
- [ ] Tenant classified as internal-only with documented risk acceptance.
|
||||
- [ ] KEV feed synced (Concelier) and tags confirmed before relying on rule.
|
||||
- [ ] Quiet expiry tracked; remediation backlog updated prior to deadline.
|
||||
- [ ] Developers informed that warnings still affect quality score.
|
||||
- [ ] Policy not used for production or internet-exposed services.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26.*
|
||||
39
docs/examples/policies/internal-only.stella
Normal file
39
docs/examples/policies/internal-only.stella
Normal file
@@ -0,0 +1,39 @@
|
||||
policy "Internal Only Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Lenient policy for internal / dev tenants."
|
||||
tags = ["internal","dev"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
env exposure_adjustments {
|
||||
if env.exposure == "internal" then -0.4
|
||||
if env.stage == "dev" then -0.6
|
||||
}
|
||||
}
|
||||
|
||||
rule block_kev priority 1 {
|
||||
when advisory.has_tag("kev")
|
||||
then status := "blocked"
|
||||
because "Known exploited vulnerabilities must be remediated."
|
||||
}
|
||||
|
||||
rule allow_medium_with_warning {
|
||||
when severity.normalized == "Medium"
|
||||
and env.exposure == "internal"
|
||||
then warn message "Medium severity permitted in internal environments."
|
||||
because "Allow Medium findings with warning for internal workloads."
|
||||
}
|
||||
|
||||
rule accept_vendor_vex {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
then status := vex.status
|
||||
annotate justification := vex.latest().justification
|
||||
because "Trust vendor VEX statements for internal scope."
|
||||
}
|
||||
|
||||
rule quiet_low_priority {
|
||||
when severity.normalized <= "Low"
|
||||
then ignore until "2026-01-01T00:00:00Z"
|
||||
because "Quiet low severity until next annual remediation sweep."
|
||||
}
|
||||
}
|
||||
31
docs/examples/policies/internal-only.yaml
Normal file
31
docs/examples/policies/internal-only.yaml
Normal file
@@ -0,0 +1,31 @@
|
||||
version: "1.0"
|
||||
metadata:
|
||||
description: Relaxed internal/development policy
|
||||
tags:
|
||||
- internal
|
||||
- dev
|
||||
rules:
|
||||
- name: Block KEV advisories
|
||||
tags: [kev]
|
||||
action: block
|
||||
|
||||
- name: Warn medium severity
|
||||
severity: [Medium]
|
||||
environments: [internal]
|
||||
action: warn
|
||||
|
||||
- name: Accept vendor VEX
|
||||
action:
|
||||
type: require_vex
|
||||
requireVex:
|
||||
vendors: [VendorX, VendorY]
|
||||
justifications:
|
||||
- component_not_present
|
||||
- vulnerable_code_not_present
|
||||
|
||||
- name: Quiet low severity
|
||||
severity: [Low, Informational]
|
||||
action:
|
||||
type: ignore
|
||||
until: 2026-01-01T00:00:00Z
|
||||
justification: "Deferred to annual remediation cycle"
|
||||
72
docs/examples/policies/serverless.md
Normal file
72
docs/examples/policies/serverless.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Serverless Policy Example (`serverless.stella`)
|
||||
|
||||
Optimised for short-lived serverless workloads: focus on runtime integrity, disallow vulnerable layers entirely, and permit temporary suppressions only with strict justification windows.
|
||||
|
||||
```dsl
|
||||
policy "Serverless Tight Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Aggressive blocking for serverless runtimes."
|
||||
tags = ["serverless","prod","strict"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
env runtime_overrides {
|
||||
if env.runtime == "serverless" then +0.7
|
||||
if env.runtime == "batch" then +0.2
|
||||
}
|
||||
}
|
||||
|
||||
rule block_any_high {
|
||||
when severity.normalized >= "High"
|
||||
then status := "blocked"
|
||||
because "Serverless workloads block High+ severities."
|
||||
}
|
||||
|
||||
rule forbid_unpinned_base {
|
||||
when sbom.has_tag("image:latest-tag")
|
||||
then status := "blocked"
|
||||
because "Base image must be pinned (no :latest)."
|
||||
}
|
||||
|
||||
rule zero_tolerance_vex {
|
||||
when vex.any(status == "not_affected")
|
||||
then requireVex { vendors = ["VendorX","VendorY"], justifications = ["component_not_present"] }
|
||||
because "Allow not_affected only from trusted vendors with strongest justification."
|
||||
}
|
||||
|
||||
rule temporary_quiet {
|
||||
when env.deployment == "canary"
|
||||
and severity.normalized == "Medium"
|
||||
then ignore until coalesce(env.quietUntil, "2025-12-31T00:00:00Z")
|
||||
because "Allow short canary quiet window while fix rolls out."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Commentary
|
||||
|
||||
- Designed for serverless tenants where redeploy cost is low and failing fast is preferred.
|
||||
- `forbid_unpinned_base` enforces supply-chain best practices.
|
||||
- `temporary_quiet` ensures quiet windows expire automatically; require deployments to set `env.quietUntil`.
|
||||
- Intended to be layered on top of baseline (override per tenant) or used standalone for serverless-only accounts.
|
||||
|
||||
## Try it out
|
||||
|
||||
```bash
|
||||
stella policy lint examples/policies/serverless.stella
|
||||
stella policy simulate P-serverless --candidate 1 \
|
||||
--sbom sbom:lambda-hello --env runtime=serverless --env deployment=canary
|
||||
```
|
||||
|
||||
## Compliance checklist
|
||||
|
||||
- [ ] Quiet window expirations tracked and documented.
|
||||
- [ ] Trusted VEX vendor list reviewed quarterly.
|
||||
- [ ] Deployment pipeline enforces pinned base images before approval.
|
||||
- [ ] Canary deployments monitored for recurrence before ignoring Medium severity.
|
||||
- [ ] Serverless teams acknowledge runbook for blocked deployments.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26.*
|
||||
|
||||
39
docs/examples/policies/serverless.stella
Normal file
39
docs/examples/policies/serverless.stella
Normal file
@@ -0,0 +1,39 @@
|
||||
policy "Serverless Tight Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Aggressive blocking for serverless runtimes."
|
||||
tags = ["serverless","prod","strict"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
env runtime_overrides {
|
||||
if env.runtime == "serverless" then +0.7
|
||||
if env.runtime == "batch" then +0.2
|
||||
}
|
||||
}
|
||||
|
||||
rule block_any_high {
|
||||
when severity.normalized >= "High"
|
||||
then status := "blocked"
|
||||
because "Serverless workloads block High+ severities."
|
||||
}
|
||||
|
||||
rule forbid_unpinned_base {
|
||||
when sbom.has_tag("image:latest-tag")
|
||||
then status := "blocked"
|
||||
because "Base image must be pinned (no :latest)."
|
||||
}
|
||||
|
||||
rule zero_tolerance_vex {
|
||||
when vex.any(status == "not_affected")
|
||||
then requireVex { vendors = ["VendorX","VendorY"], justifications = ["component_not_present"] }
|
||||
because "Allow not_affected only from trusted vendors with strongest justification."
|
||||
}
|
||||
|
||||
rule temporary_quiet {
|
||||
when env.deployment == "canary"
|
||||
and severity.normalized == "Medium"
|
||||
then ignore until coalesce(env.quietUntil, "2025-12-31T00:00:00Z")
|
||||
because "Allow short canary quiet window while fix rolls out."
|
||||
}
|
||||
}
|
||||
|
||||
30
docs/examples/policies/serverless.yaml
Normal file
30
docs/examples/policies/serverless.yaml
Normal file
@@ -0,0 +1,30 @@
|
||||
version: "1.0"
|
||||
metadata:
|
||||
description: Strict policy for serverless workloads
|
||||
tags:
|
||||
- serverless
|
||||
- prod
|
||||
- strict
|
||||
rules:
|
||||
- name: Block High And Above
|
||||
severity: [High, Critical]
|
||||
action: block
|
||||
|
||||
- name: Forbid Unpinned Base Images
|
||||
tags: [image:latest-tag]
|
||||
action: block
|
||||
|
||||
- name: Require Trusted VEX
|
||||
action:
|
||||
type: require_vex
|
||||
requireVex:
|
||||
vendors: [VendorX, VendorY]
|
||||
justifications: [component_not_present]
|
||||
|
||||
- name: Quiet Medium Canary
|
||||
severity: [Medium]
|
||||
environments: [canary]
|
||||
action:
|
||||
type: ignore
|
||||
until: 2025-12-31T00:00:00Z
|
||||
justification: "Temporary canary exception"
|
||||
96
docs/faq/policy-faq.md
Normal file
96
docs/faq/policy-faq.md
Normal file
@@ -0,0 +1,96 @@
|
||||
# Policy Engine FAQ
|
||||
|
||||
Answers to questions that Support, Ops, and Policy Guild teams receive most frequently. Pair this FAQ with the [Policy Lifecycle](../policy/lifecycle.md), [Runs](../policy/runs.md), and [CLI guide](../cli/policy.md) for deeper explanations.
|
||||
|
||||
---
|
||||
|
||||
## Authoring & DSL
|
||||
|
||||
**Q:** *Lint succeeds locally, but submit still fails with `ERR_POL_001`. Why?*
|
||||
**A:** The CLI requires lint & compile artefacts newer than 24 hours. Re-run `stella policy lint` and `stella policy compile` before submitting; ensure you upload the latest diff files with `--attach`.
|
||||
|
||||
**Q:** *How do I layer tenant-specific overrides on top of the baseline policy?*
|
||||
**A:** Keep the baseline in `tenant-global`. For tenant overrides, create a policy referencing the baseline via CLI (`stella policy new --from baseline@<version>`), then adjust rules. Activation is per tenant.
|
||||
|
||||
**Q:** *Can I import YAML/Rego policies from earlier releases?*
|
||||
**A:** No direct import. Use the migration script (`stella policy migrate legacy.yaml`) which outputs `stella-dsl@1` skeletons. Review manually before submission.
|
||||
|
||||
---
|
||||
|
||||
## Simulation & Determinism
|
||||
|
||||
**Q:** *Simulation shows huge differences even though I only tweaked metadata. What did I miss?*
|
||||
**A:** Check if your simulation used the same SBOM set/env as previous runs. CLI default uses golden fixtures; UI can store custom presets. Large diffs may also indicate Concelier updates; compare advisory cursors in the Simulation tab.
|
||||
|
||||
**Q:** *How do we guard against non-deterministic behaviour?*
|
||||
**A:** CI runs `policy simulate` twice with identical inputs and compares outputs (`DEVOPS-POLICY-20-003`). Any difference fails the pipeline. Locally you can use `stella policy run replay` to verify determinism.
|
||||
|
||||
**Q:** *What happens if the determinism guard (`ERR_POL_004`) triggers?*
|
||||
**A:** Policy Engine halts the run, raises `policy.run.failed` with code `ERR_POL_004`, and switches to incident mode (100 % sampling). Review recent code changes; often caused by new helpers that call `DateTime.Now` or non-allowlisted HTTP clients.
|
||||
|
||||
---
|
||||
|
||||
## VEX & Suppressions
|
||||
|
||||
**Q:** *A vendor marked a CVE `not_affected` but the policy still blocks. Why?*
|
||||
**A:** Check the required justifications. Baseline policy only accepts `component_not_present` and `vulnerable_code_not_present`. Other statuses need explicit rules. Use `stella findings explain` to see which VEX statement was considered.
|
||||
|
||||
**Q:** *Can we quiet a finding indefinitely?*
|
||||
**A:** Avoid indefinite quiets. Policy DSL requires an `until` timestamp. If the use case is permanent, move the rule into baseline logic with strong justification and documentation.
|
||||
|
||||
**Q:** *How do we detect overuse of suppressions?*
|
||||
**A:** Observability exports `policy_suppressions_total` and CLI `stella policy stats`. Review weekly; Support flags tenants whose suppressions grow faster than remediation tickets.
|
||||
|
||||
---
|
||||
|
||||
## Runs & Operations
|
||||
|
||||
**Q:** *Incremental runs are backlogged. What should we check first?*
|
||||
**A:** Inspect `policy_run_queue_depth` and `policy_delta_backlog_age_seconds` dashboards. If queue depth high, scale worker replicas or investigate upstream change storms (Concelier/Excititor). Use `stella policy run list --status failed` for recent errors.
|
||||
|
||||
**Q:** *Full runs take longer than 30 min. Is that a breach?*
|
||||
**A:** Goal is ≤ 30 min, but large tenants may exceed temporarily. Ensure Mongo indexes are current and that worker nodes meet sizing (4 vCPU). Consider sharding runs by SBOM group.
|
||||
|
||||
**Q:** *How do I replay a run for audit evidence?*
|
||||
**A:** `stella policy run replay <runId> --output replay.tgz` produces a sealed bundle. Upload to evidence locker or attach to incident tickets.
|
||||
|
||||
---
|
||||
|
||||
## Approvals & Governance
|
||||
|
||||
**Q:** *Can authors approve their own policies?*
|
||||
**A:** No. Authority denies approval if `approved_by == submitted_by`. Assign at least two reviewers (one security, one product).
|
||||
|
||||
**Q:** *What scopes do bots need for CI pipelines?*
|
||||
**A:** Typically `policy:read`, `policy:simulate`, `policy:runs`. Only grant `policy:run` if the pipeline should trigger runs. Never give CI tokens `policy:approve`.
|
||||
|
||||
**Q:** *How do we manage policies in air-gapped deployments?*
|
||||
**A:** Use `stella policy bundle export --sealed` on a connected site, transfer via approved media, then `stella policy bundle import` inside the enclave. Enable `--sealed` flag in CLI/UI to block accidental outbound calls.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Q:** *API calls return `403` despite valid token.*
|
||||
**A:** Verify scope includes the specific operation (`policy:activate` vs `policy:run`). Check tenant header matches token tenant. Inspect Authority logs for denial reason (`policy_scope_denied_total` metric).
|
||||
|
||||
**Q:** *`stella policy run` exits with code `30`.*
|
||||
**A:** Network/transport error. Check connectivity to Policy Engine endpoint, TLS configuration, and CLI proxy settings.
|
||||
|
||||
**Q:** *Explain drawer shows no VEX data.*
|
||||
**A:** Either no VEX statement matched or the tenant lacks `findings:read` scope. If VEX should exist, confirm Excititor ingestion and policy joiners (see Observability dashboards).
|
||||
|
||||
---
|
||||
|
||||
## Compliance Checklist
|
||||
|
||||
- [ ] FAQ linked from Console help menu and CLI `stella policy help`.
|
||||
- [ ] Entries reviewed quarterly by Policy & Support Guilds.
|
||||
- [ ] Answers cross-reference lifecycle, runs, observability, and governance docs.
|
||||
- [ ] Incident/Escalation contact details kept current in Support playbooks.
|
||||
- [ ] FAQ translated for supported locales (if applicable).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
|
||||
176
docs/ingestion/aggregation-only-contract.md
Normal file
176
docs/ingestion/aggregation-only-contract.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Aggregation-Only Contract Reference
|
||||
|
||||
> The Aggregation-Only Contract (AOC) is the governing rule set that keeps StellaOps ingestion services deterministic, policy-neutral, and auditable. It applies to Concelier, Excititor, and any future collectors that write raw advisory or VEX documents.
|
||||
|
||||
## 1. Purpose and Scope
|
||||
|
||||
- Defines the canonical behaviour for `advisory_raw` and `vex_raw` collections and the linkset hints they may emit.
|
||||
- Applies to every ingestion runtime (`StellaOps.Concelier.*`, `StellaOps.Excititor.*`), the Authority scopes that guard them, and the DevOps/QA surfaces that verify compliance.
|
||||
- Complements the high-level architecture in [Concelier](../ARCHITECTURE_CONCELIER.md) and Authority enforcement documented in [Authority Architecture](../ARCHITECTURE_AUTHORITY.md).
|
||||
- Paired guidance: see the guard-rail checkpoints in [AOC Guardrails](../aoc/aoc-guardrails.md) and CLI usage that will land in `/docs/cli/` as part of Sprint 19 follow-up.
|
||||
|
||||
## 2. Philosophy and Goals
|
||||
|
||||
- Preserve upstream truth: ingestion only captures immutable raw facts plus provenance, never derived severity or policy decisions.
|
||||
- Defer interpretation: Policy Engine and downstream overlays remain the sole writers of materialised findings, severity, consensus, or risk scores.
|
||||
- Make every write explainable: provenance, signatures, and content hashes are required so operators can prove where each fact originated.
|
||||
- Keep outputs reproducible: identical inputs must yield identical documents, hashes, and linksets across replays and air-gapped installs.
|
||||
|
||||
## 3. Contract Invariants
|
||||
|
||||
| # | Invariant | What it forbids or requires | Enforcement surfaces |
|
||||
|---|-----------|-----------------------------|----------------------|
|
||||
| 1 | No derived severity at ingest | Reject top-level keys such as `severity`, `cvss`, `effective_status`, `consensus_provider`, `risk_score`. Raw upstream CVSS remains inside `content.raw`. | Mongo schema validator, `AOCWriteGuard`, Roslyn analyzer, `stella aoc verify`. |
|
||||
| 2 | No merges or opinionated dedupe | Each upstream document persists on its own; ingestion never collapses multiple vendors into one document. | Repository interceptors, unit/fixture suites. |
|
||||
| 3 | Provenance is mandatory | `source.*`, `upstream.*`, and `signature` metadata must be present; missing provenance triggers `ERR_AOC_004`. | Schema validator, guard, CLI verifier. |
|
||||
| 4 | Idempotent upserts | Writes keyed by `(vendor, upstream_id, content_hash)` either no-op or insert a new revision with `supersedes`. Duplicate hashes map to the same document. | Repository guard, storage unique index, CI smoke tests. |
|
||||
| 5 | Append-only revisions | Updates create a new document with `supersedes` pointer; no in-place mutation of content. | Mongo schema (`supersedes` format), guard, data migration scripts. |
|
||||
| 6 | Linkset only | Ingestion may compute link hints (`purls`, `cpes`, IDs) to accelerate joins, but must not transform or infer severity or policy. | Linkset builders reviewed via fixtures and analyzers. |
|
||||
| 7 | Policy-only effective findings | Only Policy Engine identities can write `effective_finding_*`; ingestion callers receive `ERR_AOC_006` if they attempt it. | Authority scopes, Policy Engine guard. |
|
||||
| 8 | Schema safety | Unknown top-level keys reject with `ERR_AOC_007`; timestamps use ISO 8601 UTC strings; tenant is required. | Mongo validator, JSON schema tests. |
|
||||
| 9 | Clock discipline | Collectors stamp `fetched_at` and `received_at` monotonically per batch to support reproducibility windows. | Collector contracts, QA fixtures. |
|
||||
|
||||
## 4. Raw Schemas
|
||||
|
||||
### 4.1 `advisory_raw`
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `_id` | string | `advisory_raw:{source}:{upstream_id}:{revision}`; deterministic and tenant-scoped. |
|
||||
| `tenant` | string | Required; injected by Authority middleware and asserted by schema validator. |
|
||||
| `source.vendor` | string | Provider identifier (e.g., `redhat`, `osv`, `ghsa`). |
|
||||
| `source.stream` | string | Connector stream name (`csaf`, `osv`, etc.). |
|
||||
| `source.api` | string | Absolute URI of upstream document; stored for traceability. |
|
||||
| `source.collector_version` | string | Semantic version of the collector. |
|
||||
| `upstream.upstream_id` | string | Vendor- or ecosystem-provided identifier (CVE, GHSA, vendor ID). |
|
||||
| `upstream.document_version` | string | Upstream issued timestamp or revision string. |
|
||||
| `upstream.fetched_at` / `received_at` | string | ISO 8601 UTC timestamps recorded by the collector. |
|
||||
| `upstream.content_hash` | string | `sha256:` digest of the raw payload used for idempotency. |
|
||||
| `upstream.signature` | object | Required structure storing `present`, `format`, `key_id`, `sig`; even unsigned payloads set `present: false`. |
|
||||
| `content.format` | string | Source format (`CSAF`, `OSV`, etc.). |
|
||||
| `content.spec_version` | string | Upstream spec version when known. |
|
||||
| `content.raw` | object | Full upstream payload, untouched except for transport normalisation. |
|
||||
| `identifiers` | object | Normalised identifiers (`cve`, `ghsa`, `aliases`, etc.) derived losslessly from raw content. |
|
||||
| `linkset` | object | Join hints (see section 4.3). |
|
||||
| `supersedes` | string or null | Points to previous revision of same upstream doc when content hash changes. |
|
||||
|
||||
### 4.2 `vex_raw`
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `_id` | string | `vex_raw:{source}:{upstream_id}:{revision}`. |
|
||||
| `tenant` | string | Required; matches advisory collection requirements. |
|
||||
| `source.*` | object | Same shape and requirements as `advisory_raw`. |
|
||||
| `upstream.*` | object | Includes `document_version`, timestamps, `content_hash`, and `signature`. |
|
||||
| `content.format` | string | Typically `CycloneDX-VEX` or `CSAF-VEX`. |
|
||||
| `content.raw` | object | Entire upstream VEX payload. |
|
||||
| `identifiers.statements` | array | Normalised statement summaries (IDs, PURLs, status, justification) to accelerate policy joins. |
|
||||
| `linkset` | object | CVEs, GHSA IDs, and PURLs referenced in the document. |
|
||||
| `supersedes` | string or null | Same convention as advisory documents. |
|
||||
|
||||
### 4.3 Linkset Fields
|
||||
|
||||
- `purls`: fully qualified Package URLs extracted from raw ranges or product nodes.
|
||||
- `cpes`: Common Platform Enumerations when upstream docs provide them.
|
||||
- `aliases`: Any alternate advisory identifiers present in the payload.
|
||||
- `references`: Array of `{ type, url }` pairs pointing back to vendor advisories, patches, or exploits.
|
||||
- `reconciled_from`: Provenance of linkset entries (JSON Pointer or field origin) to make automated checks auditable.
|
||||
|
||||
### 4.4 `advisory_observations`
|
||||
|
||||
`advisory_observations` is an immutable projection of the validated raw document used by Link‑Not‑Merge overlays. Fields mirror the JSON contract surfaced by `StellaOps.Concelier.Models.Observations.AdvisoryObservation`.
|
||||
|
||||
| Field | Type | Notes |
|
||||
|-------|------|-------|
|
||||
| `_id` | string | Deterministic observation id — `{tenant}:{source.vendor}:{upstreamId}:{revision}`. |
|
||||
| `tenant` | string | Lower-case tenant identifier. |
|
||||
| `source.vendor` / `source.stream` | string | Connector identity (e.g., `vendor/redhat`, `ecosystem/osv`). |
|
||||
| `source.api` | string | Absolute URI the connector fetched from. |
|
||||
| `source.collectorVersion` | string | Optional semantic version of the connector build. |
|
||||
| `upstream.upstream_id` | string | Advisory identifier as issued by the provider (CVE, vendor ID, etc.). |
|
||||
| `upstream.document_version` | string | Upstream revision/version string. |
|
||||
| `upstream.fetchedAt` / `upstream.receivedAt` | datetime | UTC timestamps recorded by the connector. |
|
||||
| `upstream.contentHash` | string | `sha256:` digest used for idempotency. |
|
||||
| `upstream.signature` | object | `{present, format?, keyId?, signature?}` describing upstream signature material. |
|
||||
| `content.format` / `content.specVersion` | string | Raw payload format metadata (CSAF, OSV, JSON, etc.). |
|
||||
| `content.raw` | object | Full upstream document stored losslessly (Relaxed Extended JSON). |
|
||||
| `content.metadata` | object | Optional connector-specific metadata (batch ids, hints). |
|
||||
| `linkset.aliases` | array | Normalized aliases (lower-case, sorted). |
|
||||
| `linkset.purls` | array | Normalized PURLs extracted from the document. |
|
||||
| `linkset.cpes` | array | Normalized CPE URIs. |
|
||||
| `linkset.references` | array | `{ type, url }` pairs (type lower-case). |
|
||||
| `createdAt` | datetime | Timestamp when Concelier persisted the observation. |
|
||||
| `attributes` | object | Optional provenance attributes keyed by connector. |
|
||||
|
||||
## 5. Error Model
|
||||
|
||||
| Code | Description | HTTP status | Surfaces |
|
||||
|------|-------------|-------------|----------|
|
||||
| `ERR_AOC_001` | Forbidden field detected (severity, cvss, effective data). | 400 | Ingestion APIs, CLI verifier, CI guard. |
|
||||
| `ERR_AOC_002` | Merge attempt detected (multiple upstream sources fused into one document). | 400 | Ingestion APIs, CLI verifier. |
|
||||
| `ERR_AOC_003` | Idempotency violation (duplicate without supersedes pointer). | 409 | Repository guard, Mongo unique index, CLI verifier. |
|
||||
| `ERR_AOC_004` | Missing provenance metadata (`source`, `upstream`, `signature`). | 422 | Schema validator, ingestion endpoints. |
|
||||
| `ERR_AOC_005` | Signature or checksum mismatch. | 422 | Collector validation, CLI verifier. |
|
||||
| `ERR_AOC_006` | Attempt to persist derived findings from ingestion context. | 403 | Policy engine guard, Authority scopes. |
|
||||
| `ERR_AOC_007` | Unknown top-level fields (schema violation). | 400 | Mongo validator, CLI verifier. |
|
||||
|
||||
Consumers should map these codes to CLI exit codes and structured log events so automation can fail fast and produce actionable guidance.
|
||||
|
||||
## 6. API and Tooling Interfaces
|
||||
|
||||
- **Concelier ingestion** (`StellaOps.Concelier.WebService`)
|
||||
- `POST /ingest/advisory`: accepts upstream payload metadata; server-side guard constructs and persists raw document.
|
||||
- `GET /advisories/raw/{id}` and filterable list endpoints expose raw documents for debugging and offline analysis.
|
||||
- `POST /aoc/verify`: runs guard checks over recent documents and returns summary totals plus first violations.
|
||||
- **Excititor ingestion** (`StellaOps.Excititor.WebService`) mirrors the same surface for VEX documents.
|
||||
- **CLI workflows** (`stella aoc verify`, `stella sources ingest --dry-run`) surface pre-flight verification; documentation will live in `/docs/cli/` alongside Sprint 19 CLI updates.
|
||||
- **Authority scopes**: new `advisory:write`, `advisory:verify`, `vex:write`, and `vex:verify` scopes enforce least privilege; see [Authority Architecture](../ARCHITECTURE_AUTHORITY.md) for scope grammar.
|
||||
|
||||
## 7. Idempotency and Supersedes Rules
|
||||
|
||||
1. Compute `content_hash` before any transformation; use it with `(source.vendor, upstream.upstream_id)` to detect duplicates.
|
||||
2. If a document with the same hash already exists, skip the write and log a no-op.
|
||||
3. When a new hash arrives for an existing upstream document, insert a new record and set `supersedes` to the previous `_id`.
|
||||
4. Keep supersedes chains acyclic; collectors must resolve conflicts by rewinding before they insert.
|
||||
5. Expose idempotency counters via metrics (`ingestion_write_total{result=ok|noop}`) to catch regressions early.
|
||||
|
||||
## 8. Migration Playbook
|
||||
|
||||
1. Freeze ingestion writes except for raw pass-through paths while deploying schema validators.
|
||||
2. Snapshot existing collections to `_backup_*` for rollback safety.
|
||||
3. Strip forbidden fields from historical documents into a temporary `advisory_view_legacy` used only during transition.
|
||||
4. Enable Mongo JSON schema validators for `advisory_raw` and `vex_raw`.
|
||||
5. Run collectors in `--dry-run` to confirm only allowed keys appear; fix violations before lifting the freeze.
|
||||
6. Point Policy Engine to consume exclusively from raw collections and compute derived outputs downstream.
|
||||
7. Delete legacy normalisation paths from ingestion code and enable runtime guards plus CI linting.
|
||||
8. Roll forward CLI, Console, and dashboards so operators can monitor AOC status end-to-end.
|
||||
|
||||
## 9. Observability and Diagnostics
|
||||
|
||||
- **Metrics**: `ingestion_write_total{result=ok|reject}`, `aoc_violation_total{code}`, `ingestion_signature_verified_total{result}`, `ingestion_latency_seconds`, `advisory_revision_count`.
|
||||
- **Traces**: spans `ingest.fetch`, `ingest.transform`, `ingest.write`, and `aoc.guard` with correlation IDs shared across workers.
|
||||
- **Logs**: structured entries must include `tenant`, `source.vendor`, `upstream.upstream_id`, `content_hash`, and `violation_code` when applicable.
|
||||
- **Dashboards**: DevOps should add panels for violation counts, signature failures, supersedes growth, and CLI verifier outcomes for each tenant.
|
||||
|
||||
## 10. Security and Tenancy Checklist
|
||||
|
||||
- Enforce Authority scopes (`advisory:write`, `vex:write`, `advisory:verify`, `vex:verify`) and require tenant claims on every request.
|
||||
- Maintain pinned trust stores for signature verification; capture verification result in metrics and logs.
|
||||
- Ensure collectors never log secrets or raw authentication headers; redact tokens before persistence.
|
||||
- Validate that Policy Engine remains the only identity with permission to write `effective_finding_*` documents.
|
||||
- Verify offline bundles include the raw collections, guard configuration, and verifier binaries so air-gapped installs can audit parity.
|
||||
- Document operator steps for recovering from violations, including rollback to superseded revisions and re-running policy evaluation.
|
||||
|
||||
## 11. Compliance Checklist
|
||||
|
||||
- [ ] Deterministic guard enabled in Concelier and Excititor repositories.
|
||||
- [ ] Mongo validators deployed for `advisory_raw` and `vex_raw`.
|
||||
- [ ] Authority scopes and tenant enforcement verified via integration tests.
|
||||
- [ ] CLI and CI pipelines run `stella aoc verify` against seeded snapshots.
|
||||
- [ ] Observability feeds (metrics, logs, traces) wired into dashboards with alerts.
|
||||
- [ ] Offline kit instructions updated to bundle validators and verifier tooling.
|
||||
- [ ] Security review recorded covering ingestion, tenancy, and rollback procedures.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
141
docs/observability/observability.md
Normal file
141
docs/observability/observability.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# AOC Observability Guide
|
||||
|
||||
> **Audience:** Observability Guild, Concelier/Excititor SREs, platform operators.
|
||||
> **Scope:** Metrics, traces, logs, dashboards, and runbooks introduced as part of the Aggregation-Only Contract (AOC) rollout (Sprint 19).
|
||||
|
||||
This guide captures the canonical signals emitted by Concelier and Excititor once AOC guards are active. It explains how to consume the metrics in dashboards, correlate traces/logs for incident triage, and operate in offline environments. Pair this guide with the [AOC reference](../ingestion/aggregation-only-contract.md) and [architecture overview](../architecture/overview.md).
|
||||
|
||||
---
|
||||
|
||||
## 1 · Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `ingestion_write_total` | Counter | `source`, `tenant`, `result` (`ok`, `reject`, `noop`) | Counts write attempts to `advisory_raw`/`vex_raw`. Rejects correspond to guard failures. |
|
||||
| `ingestion_latency_seconds` | Histogram | `source`, `tenant`, `phase` (`fetch`, `transform`, `write`) | Measures end-to-end runtime for ingestion stages. Use `quantile=0.95` for alerting. |
|
||||
| `aoc_violation_total` | Counter | `source`, `tenant`, `code` (`ERR_AOC_00x`) | Total guard violations bucketed by error code. Drives dashboard pills and alert thresholds. |
|
||||
| `ingestion_signature_verified_total` | Counter | `source`, `tenant`, `result` (`ok`, `fail`, `skipped`) | Tracks signature/checksum verification outcomes. |
|
||||
| `advisory_revision_count` | Gauge | `source`, `tenant` | Supersedes depth for raw documents; spikes indicate noisy upstream feeds. |
|
||||
| `verify_runs_total` | Counter | `tenant`, `initiator` (`ui`, `cli`, `api`, `scheduled`) | How many `stella aoc verify` or `/aoc/verify` runs executed. |
|
||||
| `verify_duration_seconds` | Histogram | `tenant`, `initiator` | Runtime of verification jobs; use P95 to detect regressions. |
|
||||
|
||||
### 1.1 Alerts
|
||||
|
||||
- **Violation spike:** Alert when `increase(aoc_violation_total[15m]) > 0` for critical sources. Page SRE if `code="ERR_AOC_005"` (signature failure) or `ERR_AOC_001` persists > 30 min.
|
||||
- **Stale ingestion:** Alert when `max_over_time(ingestion_latency_seconds_sum / ingestion_latency_seconds_count)[30m]` exceeds 30 s or if `ingestion_write_total` has no growth for > 60 min.
|
||||
- **Signature drop:** Warn when `rate(ingestion_signature_verified_total{result="fail"}[1h]) > 0`.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Traces
|
||||
|
||||
### 2.1 Span taxonomy
|
||||
|
||||
| Span name | Parent | Key attributes |
|
||||
|-----------|--------|----------------|
|
||||
| `ingest.fetch` | job root span | `source`, `tenant`, `uri`, `contentHash` |
|
||||
| `ingest.transform` | `ingest.fetch` | `documentType` (`csaf`, `osv`, `vex`), `payloadBytes` |
|
||||
| `ingest.write` | `ingest.transform` | `collection` (`advisory_raw`, `vex_raw`), `result` (`ok`, `reject`) |
|
||||
| `aoc.guard` | `ingest.write` | `code` (on violation), `violationCount`, `supersedes` |
|
||||
| `verify.run` | verification job root | `tenant`, `window.from`, `window.to`, `sources`, `violations` |
|
||||
|
||||
### 2.2 Trace usage
|
||||
|
||||
- Correlate UI dashboard entries with traces via `traceId` surfaced in violation drawers (`docs/ui/console.md`).
|
||||
- Use `aoc.guard` spans to inspect guard payload snapshots. Sensitive fields are redacted automatically; raw JSON lives in secure logs only.
|
||||
- For scheduled verification, filter traces by `initiator="scheduled"` to compare runtimes pre/post change.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Logs
|
||||
|
||||
Structured logs include the following keys (JSON):
|
||||
|
||||
| Key | Description |
|
||||
|-----|-------------|
|
||||
| `traceId` | Matches OpenTelemetry trace/span IDs for cross-system correlation. |
|
||||
| `tenant` | Tenant identifier enforced by Authority middleware. |
|
||||
| `source.vendor` | Logical source (e.g., `redhat`, `ubuntu`, `osv`, `ghsa`). |
|
||||
| `upstream.upstreamId` | Vendor-provided ID (CVE, GHSA, etc.). |
|
||||
| `contentHash` | `sha256:` digest of the raw document. |
|
||||
| `violation.code` | Present when guard rejects `ERR_AOC_00x`. |
|
||||
| `verification.window` | Present on `/aoc/verify` job logs. |
|
||||
|
||||
Logs are shipped to the central Loki/Elasticsearch cluster. Use the template query:
|
||||
|
||||
```logql
|
||||
{app="concelier-web"} | json | violation_code != ""
|
||||
```
|
||||
|
||||
to spot active AOC violations.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Dashboards
|
||||
|
||||
Primary Grafana dashboard: **“AOC Ingestion Health”** (`dashboards/aoc-ingestion.json`). Panels include:
|
||||
|
||||
1. **Sources overview:** table fed by `ingestion_write_total` and `ingestion_latency_seconds` (mirrors Console tiles).
|
||||
2. **Violation trend:** stacked bar chart of `aoc_violation_total` per code.
|
||||
3. **Signature success rate:** timeseries derived from `ingestion_signature_verified_total`.
|
||||
4. **Supersedes depth:** gauge showing `advisory_revision_count` P95.
|
||||
5. **Verification runs:** histogram and latency boxplot using `verify_runs_total` / `verify_duration_seconds`.
|
||||
|
||||
Secondary dashboards:
|
||||
|
||||
- **AOC Alerts (Ops view):** summarises active alerts, last verify run, and links to incident runbook.
|
||||
- **Offline Mode Dashboard:** fed from Offline Kit imports; highlights snapshot age and queued verification jobs.
|
||||
|
||||
Update `docs/assets/dashboards/` with screenshots when Grafana capture pipeline produces the latest renders.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Operational workflows
|
||||
|
||||
1. **During ingestion incident:**
|
||||
- Check Console dashboard for offending sources.
|
||||
- Pivot to logs using document `contentHash`.
|
||||
- Re-run `stella sources ingest --dry-run` with problematic payloads to validate fixes.
|
||||
- After remediation, run `stella aoc verify --since 24h` and confirm exit code `0`.
|
||||
2. **Scheduled verification:**
|
||||
- Configure cron job to run `stella aoc verify --format json --export ...`.
|
||||
- Ship JSON to `aoc-verify` bucket and ingest into metrics using custom exporter.
|
||||
- Alert on missing exports (no file uploaded within 26 h).
|
||||
3. **Offline kit validation:**
|
||||
- Use Offline Dashboard to ensure snapshots contain latest metrics.
|
||||
- Run verification reports locally and attach to bundle before distribution.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Offline considerations
|
||||
|
||||
- Metrics exporters bundled with Offline Kit write to local Prometheus snapshots; sync them with central Grafana once connectivity is restored.
|
||||
- CLI verification reports should be hashed (`sha256sum`) and archived for audit trails.
|
||||
- Dashboards include offline data sources (`prometheus-offline`) switchable via dropdown.
|
||||
|
||||
---
|
||||
|
||||
## 7 · References
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Architecture overview](../architecture/overview.md)
|
||||
- [Console AOC dashboard](../ui/console.md)
|
||||
- [CLI AOC commands](../cli/cli-reference.md)
|
||||
- [Concelier architecture](../ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor architecture](../ARCHITECTURE_EXCITITOR.md)
|
||||
|
||||
---
|
||||
|
||||
## 8 · Compliance checklist
|
||||
|
||||
- [ ] Metrics documented with label sets and alert guidance.
|
||||
- [ ] Tracing span taxonomy aligned with Concelier/Excititor implementation.
|
||||
- [ ] Log schema matches structured logging contracts (traceId, tenant, source, contentHash).
|
||||
- [ ] Grafana dashboard references verified and screenshots scheduled.
|
||||
- [ ] Offline/air-gap workflow captured.
|
||||
- [ ] Cross-links to AOC reference, console, and CLI docs included.
|
||||
- [ ] Observability Guild sign-off scheduled (OWNER: @obs-guild, due 2025-10-28).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
166
docs/observability/policy.md
Normal file
166
docs/observability/policy.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Policy Engine Observability
|
||||
|
||||
> **Audience:** Observability Guild, SRE/Platform operators, Policy Guild.
|
||||
> **Scope:** Metrics, logs, traces, dashboards, alerting, sampling, and incident workflows for the Policy Engine service (Sprint 20).
|
||||
> **Prerequisites:** Policy Engine v2 deployed with OpenTelemetry exporters enabled (`observability:enabled=true` in config).
|
||||
|
||||
---
|
||||
|
||||
## 1 · Instrumentation Overview
|
||||
|
||||
- **Telemetry stack:** OpenTelemetry SDK (metrics + traces), Serilog structured logging, OTLP exporters → Collector → Prometheus/Loki/Tempo.
|
||||
- **Namespace conventions:** `policy.*` for metrics/traces/log categories; labels use `tenant`, `policy`, `mode`, `runId`.
|
||||
- **Sampling:** Default 10 % trace sampling, 1 % rule-hit log sampling; incident mode overrides to 100 % (see §6).
|
||||
- **Correlation IDs:** Every API request gets `traceId` + `requestId`. CLI/UI display IDs to streamline support.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Metrics
|
||||
|
||||
### 2.1 Run Pipeline
|
||||
|
||||
| Metric | Type | Labels | Notes |
|
||||
|--------|------|--------|-------|
|
||||
| `policy_run_seconds` | Histogram | `tenant`, `policy`, `mode` (`full`, `incremental`, `simulate`) | P95 target ≤ 5 min incremental, ≤ 30 min full. |
|
||||
| `policy_run_queue_depth` | Gauge | `tenant` | Number of pending jobs per tenant (updated each enqueue/dequeue). |
|
||||
| `policy_run_failures_total` | Counter | `tenant`, `policy`, `reason` (`err_pol_*`, `network`, `cancelled`) | Aligns with error codes. |
|
||||
| `policy_run_retries_total` | Counter | `tenant`, `policy` | Helps identify noisy sources. |
|
||||
| `policy_run_inputs_pending_bytes` | Gauge | `tenant` | Size of buffered change batches awaiting run. |
|
||||
|
||||
### 2.2 Evaluator Insights
|
||||
|
||||
| Metric | Type | Labels | Notes |
|
||||
|--------|------|--------|-------|
|
||||
| `policy_rules_fired_total` | Counter | `tenant`, `policy`, `rule` | Increment per rule match (sampled). |
|
||||
| `policy_vex_overrides_total` | Counter | `tenant`, `policy`, `vendor`, `justification` | Tracks VEX precedence decisions. |
|
||||
| `policy_suppressions_total` | Counter | `tenant`, `policy`, `action` (`ignore`, `warn`, `quiet`) | Audits suppression usage. |
|
||||
| `policy_selection_batch_duration_seconds` | Histogram | `tenant`, `policy` | Measures joiner performance. |
|
||||
| `policy_materialization_conflicts_total` | Counter | `tenant`, `policy` | Non-zero indicates optimistic concurrency retries. |
|
||||
|
||||
### 2.3 API Surface
|
||||
|
||||
| Metric | Type | Labels | Notes |
|
||||
|--------|------|--------|-------|
|
||||
| `policy_api_requests_total` | Counter | `endpoint`, `method`, `status` | Exposed via Minimal API instrumentation. |
|
||||
| `policy_api_latency_seconds` | Histogram | `endpoint`, `method` | Budget ≤ 250 ms for GETs, ≤ 1 s for POSTs. |
|
||||
| `policy_api_rate_limited_total` | Counter | `endpoint` | Tied to throttles (`429`). |
|
||||
|
||||
### 2.4 Queue & Change Streams
|
||||
|
||||
| Metric | Type | Labels | Notes |
|
||||
|--------|------|--------|-------|
|
||||
| `policy_queue_leases_active` | Gauge | `tenant` | Number of leased jobs. |
|
||||
| `policy_queue_lease_expirations_total` | Counter | `tenant` | Alerts when workers fail to ack. |
|
||||
| `policy_delta_backlog_age_seconds` | Gauge | `tenant`, `source` (`concelier`, `excititor`, `sbom`) | Age of oldest unprocessed change event. |
|
||||
|
||||
---
|
||||
|
||||
## 3 · Logs
|
||||
|
||||
- **Format:** JSON (`Serilog`). Core fields: `timestamp`, `level`, `message`, `policyId`, `policyVersion`, `tenant`, `runId`, `rule`, `traceId`, `env.sealed`, `error.code`.
|
||||
- **Log categories:**
|
||||
- `policy.run` (queue lifecycle, run begin/end, stats)
|
||||
- `policy.evaluate` (batch execution summaries; rule-hit sampling)
|
||||
- `policy.materialize` (Mongo operations, conflicts, retries)
|
||||
- `policy.simulate` (diff results, CLI invocation metadata)
|
||||
- `policy.lifecycle` (submit/review/approve events)
|
||||
- **Sampling:** Rule-hit logs sample 1 % by default; toggled to 100 % in incident mode or when `--trace` flag used in CLI.
|
||||
- **PII:** No user secrets recorded; user identities referenced as `user:<id>` or `group:<id>` only.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Traces
|
||||
|
||||
- Spans emit via OpenTelemetry instrumentation.
|
||||
- **Primary spans:**
|
||||
- `policy.api` – wraps HTTP request, records `endpoint`, `status`, `scope`.
|
||||
- `policy.select` – change stream ingestion and batch assembly (attributes: `candidateCount`, `cursor`).
|
||||
- `policy.evaluate` – evaluation batch (attributes: `batchSize`, `ruleHits`, `severityChanges`).
|
||||
- `policy.materialize` – Mongo writes (attributes: `writes`, `historyWrites`, `retryCount`).
|
||||
- `policy.simulate` – simulation diff generation (attributes: `sbomCount`, `diffAdded`, `diffRemoved`).
|
||||
- Trace context propagated to CLI via response headers `traceparent`; UI surfaces in run detail view.
|
||||
- Incident mode forces span sampling to 100 % and extends retention via Collector config override.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Dashboards
|
||||
|
||||
### 5.1 Policy Runs Overview
|
||||
|
||||
Widgets:
|
||||
- Run duration histogram (per mode/tenant).
|
||||
- Queue depth + backlog age line charts.
|
||||
- Failure rate stacked by error code.
|
||||
- Incremental backlog heatmap (policy × age).
|
||||
- Active vs scheduled runs table.
|
||||
|
||||
### 5.2 Rule Impact & VEX
|
||||
|
||||
- Top N rules by firings (bar chart).
|
||||
- VEX overrides by vendor/justification (stacked chart).
|
||||
- Suppression usage (pie + table with justifications).
|
||||
- Quieted findings trend (line).
|
||||
|
||||
### 5.3 Simulation & Approval Health
|
||||
|
||||
- Simulation diff histogram (added vs removed).
|
||||
- Pending approvals by age (table with SLA colour coding).
|
||||
- Compliance checklist status (lint, determinism CI, simulation evidence).
|
||||
|
||||
> Placeholders for Grafana panels should be replaced with actual screenshots once dashboards land (`../assets/policy-observability/*.png`).
|
||||
|
||||
---
|
||||
|
||||
## 6 · Alerting
|
||||
|
||||
| Alert | Condition | Suggested Action |
|
||||
|-------|-----------|------------------|
|
||||
| **PolicyRunSlaBreach** | `policy_run_seconds{mode="incremental"}` P95 > 300 s for 3 windows | Check queue depth, upstream services, scale worker pool. |
|
||||
| **PolicyQueueStuck** | `policy_delta_backlog_age_seconds` > 600 | Investigate change stream connectivity. |
|
||||
| **DeterminismMismatch** | Run status `failed` with `ERR_POL_004` OR CI replay diff | Switch to incident sampling, gather replay bundle, notify Policy Guild. |
|
||||
| **SimulationDrift** | CLI/CI simulation exit `20` (blocking diff) over threshold | Review policy changes before approval. |
|
||||
| **VexOverrideSpike** | `policy_vex_overrides_total` > configured baseline (per vendor) | Verify upstream VEX feed; ensure justification codes expected. |
|
||||
| **SuppressionSurge** | `policy_suppressions_total` increase > 3σ vs baseline | Audit new suppress rules; check approvals. |
|
||||
|
||||
Alerts integrate with Notifier channels (`policy.alerts`) and Ops on-call rotations.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Incident Mode & Forensics
|
||||
|
||||
- Toggle via `POST /api/policy/incidents/activate` (requires `policy:operate` scope).
|
||||
- Effects:
|
||||
- Trace sampling → 100 %.
|
||||
- Rule-hit log sampling → 100 %.
|
||||
- Retention window extended to 30 days for incident duration.
|
||||
- `policy.incident.activated` event emitted (Console + Notifier banners).
|
||||
- Post-incident tasks:
|
||||
- `stella policy run replay` for affected runs; attach bundles to incident record.
|
||||
- Restore sampling defaults with `.../deactivate`.
|
||||
- Update incident checklist in `/docs/policy/lifecycle.md` (section 8) with findings.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Integration Points
|
||||
|
||||
- **Authority:** Exposes metric `policy_scope_denied_total` for failed authorisation; correlate with `policy_api_requests_total`.
|
||||
- **Concelier/Excititor:** Shared trace IDs propagate via gRPC metadata to help debug upstream latency.
|
||||
- **Scheduler:** Future integration will push run queues into shared scheduler dashboards (planned in SCHED-MODELS-20-002).
|
||||
- **Offline Kit:** CLI exports logs + metrics snapshots (`stella offline bundle metrics`) for air-gapped audits.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Compliance Checklist
|
||||
|
||||
- [ ] **Metrics registered:** All metrics listed above exported and documented in Grafana dashboards.
|
||||
- [ ] **Alert policies configured:** Ops or Observability Guild created alerts matching table in §6.
|
||||
- [ ] **Sampling overrides tested:** Incident mode toggles verified in staging; retention roll-back rehearsed.
|
||||
- [ ] **Trace propagation validated:** CLI/UI display trace IDs and allow copy for support.
|
||||
- [ ] **Log scrubbing enforced:** Unit tests guarantee no secrets/PII in logs; sampling respects configuration.
|
||||
- [ ] **Offline capture rehearsed:** Metrics/log snapshot commands executed in sealed environment.
|
||||
- [ ] **Docs cross-links:** Links to architecture, runs, lifecycle, CLI, API docs verified.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
|
||||
@@ -1,152 +1,159 @@
|
||||
# Concelier Authority Audit Runbook
|
||||
|
||||
# Concelier Authority Audit Runbook
|
||||
|
||||
_Last updated: 2025-10-22_
|
||||
|
||||
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
|
||||
This runbook helps operators verify and monitor the StellaOps Concelier ⇆ Authority integration. It focuses on the `/jobs*` surface, which now requires StellaOps Authority tokens, and the corresponding audit/metric signals that expose authentication and bypass activity.
|
||||
|
||||
## 1. Prerequisites
|
||||
|
||||
- Authority integration is enabled in `concelier.yaml` (or via `CONCELIER_AUTHORITY__*` environment variables) with a valid `clientId`, secret, audience, and required scopes.
|
||||
- OTLP metrics/log exporters are configured (`concelier.telemetry.*`) or container stdout is shipped to your SIEM.
|
||||
- Operators have access to the Concelier job trigger endpoints via CLI or REST for smoke tests.
|
||||
- The rollout table in `docs/10_CONCELIER_CLI_QUICKSTART.md` has been reviewed so stakeholders align on the staged → enforced toggle timeline.
|
||||
|
||||
### Configuration snippet
|
||||
|
||||
```yaml
|
||||
concelier:
|
||||
authority:
|
||||
enabled: true
|
||||
allowAnonymousFallback: false # keep true only during initial rollout
|
||||
issuer: "https://authority.internal"
|
||||
audiences:
|
||||
- "api://concelier"
|
||||
requiredScopes:
|
||||
- "concelier.jobs.trigger"
|
||||
bypassNetworks:
|
||||
- "127.0.0.1/32"
|
||||
- "::1/128"
|
||||
clientId: "concelier-jobs"
|
||||
clientSecretFile: "/run/secrets/concelier_authority_client"
|
||||
tokenClockSkewSeconds: 60
|
||||
resilience:
|
||||
enableRetries: true
|
||||
retryDelays:
|
||||
- "00:00:01"
|
||||
- "00:00:02"
|
||||
- "00:00:05"
|
||||
allowOfflineCacheFallback: true
|
||||
offlineCacheTolerance: "00:10:00"
|
||||
```
|
||||
|
||||
> Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
|
||||
|
||||
### Resilience tuning
|
||||
|
||||
- **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
|
||||
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
|
||||
- Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
|
||||
|
||||
## 2. Key Signals
|
||||
|
||||
### 2.1 Audit log channel
|
||||
|
||||
Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
|
||||
|
||||
```
|
||||
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger bypass=False remote=10.1.4.7
|
||||
```
|
||||
|
||||
| Field | Sample value | Meaning |
|
||||
|--------------|-------------------------|------------------------------------------------------------------------------------------|
|
||||
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
|
||||
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
|
||||
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
|
||||
| `clientId` | `concelier-cli` | OAuth client ID provided by Authority ( `(none)` if the token lacked the claim). |
|
||||
| `scopes` | `concelier.jobs.trigger` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
|
||||
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
|
||||
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
|
||||
|
||||
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
|
||||
|
||||
- `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout).
|
||||
- `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration.
|
||||
- Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
|
||||
|
||||
### 2.2 Metrics
|
||||
|
||||
Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
|
||||
|
||||
| Metric name | Description | PromQL example |
|
||||
|-------------------------------|----------------------------------------------------|----------------|
|
||||
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
|
||||
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
|
||||
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
|
||||
|
||||
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names.
|
||||
|
||||
Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`:
|
||||
|
||||
- `concelier.source.http.requests_total{concelier_source="jobs-run"}` – ensures REST/manual triggers route through Authority.
|
||||
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
|
||||
|
||||
## 3. Alerting Guidance
|
||||
|
||||
1. **Unauthorized bypass attempt**
|
||||
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
|
||||
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
|
||||
|
||||
2. **Missing scopes**
|
||||
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
|
||||
- Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`.
|
||||
|
||||
3. **Trigger failure surge**
|
||||
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
|
||||
- Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors.
|
||||
|
||||
4. **Conflict spike**
|
||||
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
|
||||
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
|
||||
|
||||
5. **Authority offline**
|
||||
- Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
|
||||
|
||||
## 4. Rollout & Verification Procedure
|
||||
|
||||
|
||||
### Configuration snippet
|
||||
|
||||
```yaml
|
||||
concelier:
|
||||
authority:
|
||||
enabled: true
|
||||
allowAnonymousFallback: false # keep true only during initial rollout
|
||||
issuer: "https://authority.internal"
|
||||
audiences:
|
||||
- "api://concelier"
|
||||
requiredScopes:
|
||||
- "concelier.jobs.trigger"
|
||||
- "advisory:read"
|
||||
- "advisory:ingest"
|
||||
requiredTenants:
|
||||
- "tenant-default"
|
||||
bypassNetworks:
|
||||
- "127.0.0.1/32"
|
||||
- "::1/128"
|
||||
clientId: "concelier-jobs"
|
||||
clientSecretFile: "/run/secrets/concelier_authority_client"
|
||||
tokenClockSkewSeconds: 60
|
||||
resilience:
|
||||
enableRetries: true
|
||||
retryDelays:
|
||||
- "00:00:01"
|
||||
- "00:00:02"
|
||||
- "00:00:05"
|
||||
allowOfflineCacheFallback: true
|
||||
offlineCacheTolerance: "00:10:00"
|
||||
```
|
||||
|
||||
> Store secrets outside source control. Concelier reads `clientSecretFile` on startup; rotate by updating the mounted file and restarting the service.
|
||||
|
||||
### Resilience tuning
|
||||
|
||||
- **Connected sites:** keep the default 1 s / 2 s / 5 s retry ladder so Concelier retries transient Authority hiccups but still surfaces outages quickly. Leave `allowOfflineCacheFallback=true` so cached discovery/JWKS data can bridge short Pathfinder restarts.
|
||||
- **Air-gapped/Offline Kit installs:** extend `offlineCacheTolerance` (15–30 minutes) to keep the cached metadata valid between manual synchronisations. You can also disable retries (`enableRetries=false`) if infrastructure teams prefer to handle exponential backoff at the network layer; Concelier will fail fast but keep deterministic logs.
|
||||
- Concelier resolves these knobs through `IOptionsMonitor<StellaOpsAuthClientOptions>`. Edits to `concelier.yaml` are applied on configuration reload; restart the container if you change environment variables or do not have file-watch reloads enabled.
|
||||
|
||||
## 2. Key Signals
|
||||
|
||||
### 2.1 Audit log channel
|
||||
|
||||
Concelier emits structured audit entries via the `Concelier.Authorization.Audit` logger for every `/jobs*` request once Authority enforcement is active.
|
||||
|
||||
```
|
||||
Concelier authorization audit route=/jobs/definitions status=200 subject=ops@example.com clientId=concelier-cli scopes=concelier.jobs.trigger advisory:ingest bypass=False remote=10.1.4.7
|
||||
```
|
||||
|
||||
| Field | Sample value | Meaning |
|
||||
|--------------|-------------------------|------------------------------------------------------------------------------------------|
|
||||
| `route` | `/jobs/definitions` | Endpoint that processed the request. |
|
||||
| `status` | `200` / `401` / `409` | Final HTTP status code returned to the caller. |
|
||||
| `subject` | `ops@example.com` | User or service principal subject (falls back to `(anonymous)` when unauthenticated). |
|
||||
| `clientId` | `concelier-cli` | OAuth client ID provided by Authority (`(none)` if the token lacked the claim). |
|
||||
| `scopes` | `concelier.jobs.trigger advisory:ingest advisory:read` | Normalised scope list extracted from token claims; `(none)` if the token carried none. |
|
||||
| `tenant` | `tenant-default` | Tenant claim extracted from the Authority token (`(none)` when the token lacked it). |
|
||||
| `bypass` | `True` / `False` | Indicates whether the request succeeded because its source IP matched a bypass CIDR. |
|
||||
| `remote` | `10.1.4.7` | Remote IP recorded from the connection / forwarded header test hooks. |
|
||||
|
||||
Use your logging backend (e.g., Loki) to index the logger name and filter for suspicious combinations:
|
||||
|
||||
- `status=401 AND bypass=True` – bypass network accepted an unauthenticated call (should be temporary during rollout).
|
||||
- `status=202 AND scopes="(none)"` – a token without scopes triggered a job; tighten client configuration.
|
||||
- `status=202 AND NOT contains(scopes,"advisory:ingest")` – ingestion attempted without the new AOC scopes; confirm the Authority client registration matches the sample above.
|
||||
- `tenant!=(tenant-default)` – indicates a cross-tenant token was accepted. Ensure Concelier `requiredTenants` is aligned with Authority client registration.
|
||||
- Spike in `clientId="(none)"` – indicates upstream Authority is not issuing `client_id` claims or the CLI is outdated.
|
||||
|
||||
### 2.2 Metrics
|
||||
|
||||
Concelier publishes counters under the OTEL meter `StellaOps.Concelier.WebService.Jobs`. Tags: `job.kind`, `job.trigger`, `job.outcome`.
|
||||
|
||||
| Metric name | Description | PromQL example |
|
||||
|-------------------------------|----------------------------------------------------|----------------|
|
||||
| `web.jobs.triggered` | Accepted job trigger requests. | `sum by (job_kind) (rate(web_jobs_triggered_total[5m]))` |
|
||||
| `web.jobs.trigger.conflict` | Rejected triggers (already running, disabled…). | `sum(rate(web_jobs_trigger_conflict_total[5m]))` |
|
||||
| `web.jobs.trigger.failed` | Server-side job failures. | `sum(rate(web_jobs_trigger_failed_total[5m]))` |
|
||||
|
||||
> Prometheus/OTEL collectors typically surface counters with `_total` suffix. Adjust queries to match your pipeline’s generated metric names.
|
||||
|
||||
Correlate audit logs with the following global meter exported via `Concelier.SourceDiagnostics`:
|
||||
|
||||
- `concelier.source.http.requests_total{concelier_source="jobs-run"}` – ensures REST/manual triggers route through Authority.
|
||||
- If Grafana dashboards are deployed, extend the “Concelier Jobs” board with the above counters plus a table of recent audit log entries.
|
||||
|
||||
## 3. Alerting Guidance
|
||||
|
||||
1. **Unauthorized bypass attempt**
|
||||
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", status="401", bypass="True"}[5m])) > 0`
|
||||
- Action: verify `bypassNetworks` list; confirm expected maintenance windows; rotate credentials if suspicious.
|
||||
|
||||
2. **Missing scopes**
|
||||
- Query: `sum(rate(log_messages_total{logger="Concelier.Authorization.Audit", scopes="(none)", status="200"}[5m])) > 0`
|
||||
- Action: audit Authority client registration; ensure `requiredScopes` includes `concelier.jobs.trigger`, `advisory:ingest`, and `advisory:read`.
|
||||
|
||||
3. **Trigger failure surge**
|
||||
- Query: `sum(rate(web_jobs_trigger_failed_total[10m])) > 0` with severity `warning` if sustained for 10 minutes.
|
||||
- Action: inspect correlated audit entries and `Concelier.Telemetry` traces for job execution errors.
|
||||
|
||||
4. **Conflict spike**
|
||||
- Query: `sum(rate(web_jobs_trigger_conflict_total[10m])) > 5` (tune threshold).
|
||||
- Action: downstream scheduling may be firing repetitive triggers; ensure precedence is configured properly.
|
||||
|
||||
5. **Authority offline**
|
||||
- Watch `Concelier.Authorization.Audit` logs for `status=503` or `status=500` along with `clientId="(none)"`. Investigate Authority availability before re-enabling anonymous fallback.
|
||||
|
||||
## 4. Rollout & Verification Procedure
|
||||
|
||||
1. **Pre-checks**
|
||||
- Align with the rollout phases documented in `docs/10_CONCELIER_CLI_QUICKSTART.md` (validation → rehearsal → enforced) and record the target dates in your change request.
|
||||
- Confirm `allowAnonymousFallback` is `false` in production; keep `true` only during staged validation.
|
||||
- Validate Authority issuer metadata is reachable from Concelier (`curl https://authority.internal/.well-known/openid-configuration` from the host).
|
||||
|
||||
2. **Smoke test with valid token**
|
||||
- Obtain a token via CLI: `stella auth login --scope concelier.jobs.trigger`.
|
||||
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`.
|
||||
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger`.
|
||||
|
||||
3. **Negative test without token**
|
||||
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
|
||||
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
|
||||
|
||||
4. **Bypass check (if applicable)**
|
||||
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
|
||||
|
||||
5. **Metrics validation**
|
||||
- Ensure `web.jobs.triggered` counter increments during accepted runs.
|
||||
- Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled.
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
| Symptom | Probable cause | Remediation |
|
||||
|---------|----------------|-------------|
|
||||
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
|
||||
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
|
||||
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. |
|
||||
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. |
|
||||
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
|
||||
|
||||
## 6. References
|
||||
|
||||
- `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start.
|
||||
- `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines.
|
||||
- `docs/ops/authority-monitoring.md` – Authority-side monitoring and alerting playbook.
|
||||
- `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields.
|
||||
|
||||
2. **Smoke test with valid token**
|
||||
- Obtain a token via CLI: `stella auth login --scope "concelier.jobs.trigger advisory:ingest" --scope advisory:read`.
|
||||
- Trigger a read-only endpoint: `curl -H "Authorization: Bearer $TOKEN" https://concelier.internal/jobs/definitions`.
|
||||
- Expect HTTP 200/202 and an audit log with `bypass=False`, `scopes=concelier.jobs.trigger advisory:ingest advisory:read`, and `tenant=tenant-default`.
|
||||
|
||||
3. **Negative test without token**
|
||||
- Call the same endpoint without a token. Expect HTTP 401, `bypass=False`.
|
||||
- If the request succeeds, double-check `bypassNetworks` and ensure fallback is disabled.
|
||||
|
||||
4. **Bypass check (if applicable)**
|
||||
- From an allowed maintenance IP, call `/jobs/definitions` without a token. Confirm the audit log shows `bypass=True`. Review business justification and expiry date for such entries.
|
||||
|
||||
5. **Metrics validation**
|
||||
- Ensure `web.jobs.triggered` counter increments during accepted runs.
|
||||
- Exporters should show corresponding spans (`concelier.job.trigger`) if tracing is enabled.
|
||||
|
||||
## 5. Troubleshooting
|
||||
|
||||
| Symptom | Probable cause | Remediation |
|
||||
|---------|----------------|-------------|
|
||||
| Audit log shows `clientId=(none)` for all requests | Authority not issuing `client_id` claim or CLI outdated | Update StellaOps Authority configuration (`StellaOpsAuthorityOptions.Token.Claims.ClientId`), or upgrade the CLI token acquisition flow. |
|
||||
| Requests succeed with `bypass=True` unexpectedly | Local network added to `bypassNetworks` or fallback still enabled | Remove/adjust the CIDR list, disable anonymous fallback, restart Concelier. |
|
||||
| HTTP 401 with valid token | `requiredScopes` missing from client registration or token audience mismatch | Verify Authority client scopes (`concelier.jobs.trigger`) and ensure the token audience matches `audiences` config. |
|
||||
| Metrics missing from Prometheus | Telemetry exporters disabled or filter missing OTEL meter | Set `concelier.telemetry.enableMetrics=true`, ensure collector includes `StellaOps.Concelier.WebService.Jobs` meter. |
|
||||
| Sudden spike in `web.jobs.trigger.failed` | Downstream job failure or Authority timeout mid-request | Inspect Concelier job logs, re-run with tracing enabled, validate Authority latency. |
|
||||
|
||||
## 6. References
|
||||
|
||||
- `docs/21_INSTALL_GUIDE.md` – Authority configuration quick start.
|
||||
- `docs/17_SECURITY_HARDENING_GUIDE.md` – Security guardrails and enforcement deadlines.
|
||||
- `docs/ops/authority-monitoring.md` – Authority-side monitoring and alerting playbook.
|
||||
- `StellaOps.Concelier.WebService/Filters/JobAuthorizationAuditFilter.cs` – source of audit log fields.
|
||||
|
||||
151
docs/ops/deployment-upgrade-runbook.md
Normal file
151
docs/ops/deployment-upgrade-runbook.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Stella Ops Deployment Upgrade & Rollback Runbook
|
||||
|
||||
_Last updated: 2025-10-26 (Sprint 14 – DEVOPS-OPS-14-003)._
|
||||
|
||||
This runbook describes how to promote a new release across the supported deployment profiles (Helm and Docker Compose), how to roll back safely, and how to keep channels (`edge`, `stable`, `airgap`) aligned. All steps assume you are working from a clean checkout of the release branch/tag.
|
||||
|
||||
---
|
||||
|
||||
## 1. Channel overview
|
||||
|
||||
| Channel | Release manifest | Helm values | Compose profile |
|
||||
|---------|------------------|-------------|-----------------|
|
||||
| `edge` | `deploy/releases/2025.10-edge.yaml` | `deploy/helm/stellaops/values-dev.yaml` | `deploy/compose/docker-compose.dev.yaml` |
|
||||
| `stable` | `deploy/releases/2025.09-stable.yaml` | `deploy/helm/stellaops/values-stage.yaml`, `deploy/helm/stellaops/values-prod.yaml` | `deploy/compose/docker-compose.stage.yaml`, `deploy/compose/docker-compose.prod.yaml` |
|
||||
| `airgap` | `deploy/releases/2025.09-airgap.yaml` | `deploy/helm/stellaops/values-airgap.yaml` | `deploy/compose/docker-compose.airgap.yaml` |
|
||||
|
||||
Infrastructure components (MongoDB, MinIO, RustFS) are pinned in the release manifests and inherited by the deployment profiles. Supporting dependencies such as `nats` remain on upstream LTS tags; review `deploy/compose/*.yaml` for the authoritative set.
|
||||
|
||||
---
|
||||
|
||||
## 2. Pre-flight checklist
|
||||
|
||||
1. **Refresh release manifest**
|
||||
Pull the latest manifest for the channel you are promoting (`deploy/releases/<version>-<channel>.yaml`).
|
||||
|
||||
2. **Align deployment bundles with the manifest**
|
||||
Run the alignment checker for every profile that should pick up the release. Pass `--ignore-repo nats` to skip auxiliary services.
|
||||
```bash
|
||||
./deploy/tools/check-channel-alignment.py \
|
||||
--release deploy/releases/2025.10-edge.yaml \
|
||||
--target deploy/helm/stellaops/values-dev.yaml \
|
||||
--target deploy/compose/docker-compose.dev.yaml \
|
||||
--ignore-repo nats
|
||||
```
|
||||
Repeat for other channels (`stable`, `airgap`), substituting the manifest and target files.
|
||||
|
||||
3. **Lint and template profiles**
|
||||
```bash
|
||||
./deploy/tools/validate-profiles.sh
|
||||
```
|
||||
|
||||
4. **Smoke the Offline Kit debug store (edge/stable only)**
|
||||
When the release pipeline has generated `out/release/debug/.build-id/**`, mirror the assets into the Offline Kit staging tree:
|
||||
```bash
|
||||
./ops/offline-kit/mirror_debug_store.py \
|
||||
--release-dir out/release \
|
||||
--offline-kit-dir out/offline-kit
|
||||
```
|
||||
Archive the resulting `out/offline-kit/metadata/debug-store.json` alongside the kit bundle.
|
||||
|
||||
5. **Review compatibility matrix**
|
||||
Confirm MongoDB, MinIO, and RustFS versions in the release manifest match platform SLOs. The default targets are `mongo@sha256:c258…`, `minio@sha256:14ce…`, `rustfs:2025.10.0-edge`.
|
||||
|
||||
6. **Create a rollback bookmark**
|
||||
Record the current Helm revision (`helm history stellaops -n stellaops`) and compose tag (`git describe --tags`) before applying changes.
|
||||
|
||||
---
|
||||
|
||||
## 3. Helm upgrade procedure (staging → production)
|
||||
|
||||
1. Switch to the deployment branch and ensure secrets/config maps are current.
|
||||
2. Apply the upgrade in the staging cluster:
|
||||
```bash
|
||||
helm upgrade stellaops deploy/helm/stellaops \
|
||||
-f deploy/helm/stellaops/values-stage.yaml \
|
||||
--namespace stellaops \
|
||||
--atomic \
|
||||
--timeout 15m
|
||||
```
|
||||
3. Run smoke tests (`scripts/smoke-tests.sh` or environment-specific checks).
|
||||
4. Promote to production using the prod values file and the same command.
|
||||
5. Record the new revision number and Git SHA in the change log.
|
||||
|
||||
### Rollback (Helm)
|
||||
|
||||
1. Identify the previous revision: `helm history stellaops -n stellaops`.
|
||||
2. Execute:
|
||||
```bash
|
||||
helm rollback stellaops <revision> \
|
||||
--namespace stellaops \
|
||||
--wait \
|
||||
--timeout 10m
|
||||
```
|
||||
3. Verify `kubectl get pods` returns healthy workloads; rerun smoke tests.
|
||||
4. Update the incident/operations log with root cause and rollback details.
|
||||
|
||||
---
|
||||
|
||||
## 4. Docker Compose upgrade procedure
|
||||
|
||||
1. Update environment files (`deploy/compose/env/*.env.example`) with any new settings and sync secrets to hosts.
|
||||
2. Pull the tagged repository state corresponding to the release (e.g. `git checkout 2025.09.2` for stable).
|
||||
3. Apply the upgrade:
|
||||
```bash
|
||||
docker compose \
|
||||
--env-file deploy/compose/env/prod.env \
|
||||
-f deploy/compose/docker-compose.prod.yaml \
|
||||
pull
|
||||
|
||||
docker compose \
|
||||
--env-file deploy/compose/env/prod.env \
|
||||
-f deploy/compose/docker-compose.prod.yaml \
|
||||
up -d
|
||||
```
|
||||
4. Tail logs for critical services (`docker compose logs -f authority concelier`).
|
||||
5. Update monitoring dashboards/alerts to confirm normal operation.
|
||||
|
||||
### Rollback (Compose)
|
||||
|
||||
1. Check out the previous release tag (e.g. `git checkout 2025.09.1`).
|
||||
2. Re-run `docker compose pull` and `docker compose up -d` with that profile. Docker will restore the prior digests.
|
||||
3. If reverting to a known-good snapshot is required, restore volume backups (see `docs/ops/authority-backup-restore.md` and associated service guides).
|
||||
4. Log the rollback in the operations journal.
|
||||
|
||||
---
|
||||
|
||||
## 5. Channel promotion workflow
|
||||
|
||||
1. Author or update the channel manifest under `deploy/releases/`.
|
||||
2. Mirror the new digests into Helm/Compose values and run the alignment script for each profile.
|
||||
3. Commit the changes with a message that references the release version and channel (e.g. `deploy: promote 2025.10.0-edge`).
|
||||
4. Publish release notes and update `deploy/releases/README.md` (if applicable).
|
||||
5. Tag the repository when promoting stable or airgap builds.
|
||||
|
||||
---
|
||||
|
||||
## 6. Upgrade rehearsal & rollback drill log
|
||||
|
||||
Maintain rehearsal notes in `docs/ops/launch-cutover.md` or the relevant sprint planning document. After each drill capture:
|
||||
|
||||
- Release version tested
|
||||
- Date/time
|
||||
- Participants
|
||||
- Issues encountered & fixes
|
||||
- Rollback duration (if executed)
|
||||
|
||||
Attach the log to the sprint retro or operational wiki.
|
||||
|
||||
| Date (UTC) | Channel | Outcome | Notes |
|
||||
|------------|---------|---------|-------|
|
||||
| 2025-10-26 | Documentation dry-run | Planned | Runbook refreshed; next live drill scheduled for 2025-11 edge → stable promotion.
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
- `deploy/README.md` – structure and validation workflow for deployment bundles.
|
||||
- `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release automation and signing pipeline.
|
||||
- `docs/ARCHITECTURE_DEVOPS.md` – high-level DevOps architecture, SLOs, and compliance requirements.
|
||||
- `ops/offline-kit/mirror_debug_store.py` – debug-store mirroring helper.
|
||||
- `deploy/tools/check-channel-alignment.py` – release vs deployment digest alignment checker.
|
||||
128
docs/ops/launch-cutover.md
Normal file
128
docs/ops/launch-cutover.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Launch Cutover Runbook - Stella Ops
|
||||
|
||||
_Document owner: DevOps Guild (2025-10-26)_
|
||||
_Scope:_ Full-platform launch from staging to production for release `2025.09.2`.
|
||||
|
||||
## 1. Roles and Communication
|
||||
|
||||
| Role | Primary | Backup | Contact |
|
||||
| --- | --- | --- | --- |
|
||||
| Cutover lead | DevOps Guild (on-call engineer) | Platform Ops lead | `#launch-bridge` (Mattermost) |
|
||||
| Authority stack | Authority Core guild rep | Security guild rep | `#authority` |
|
||||
| Scanner / Queue | Scanner WebService guild rep | Runtime guild rep | `#scanner` |
|
||||
| Storage | Mongo/MinIO operators | Backup DB admin | Pager escalation |
|
||||
| Observability | Telemetry guild rep | SRE on-call | `#telemetry` |
|
||||
| Approvals | Product owner + CTO | DevOps lead | Approval recorded in change ticket |
|
||||
|
||||
Set up a bridge call 30 minutes before start and keep `#launch-bridge` updated every 10 minutes.
|
||||
|
||||
## 2. Timeline Overview (UTC)
|
||||
|
||||
| Time | Activity | Owner |
|
||||
| --- | --- | --- |
|
||||
| T-24h | Change ticket approved, prod secrets verified, offline kit build status checked (`DEVOPS-OFFLINE-18-005`). | DevOps lead |
|
||||
| T-12h | Run `deploy/tools/validate-profiles.sh`; capture logs in ticket. | DevOps engineer |
|
||||
| T-6h | Freeze non-launch deployments; notify guild leads. | Product owner |
|
||||
| T-2h | Execute rehearsal in staging (Section 3) using `values-stage.yaml` to verify scripts. | DevOps + module reps |
|
||||
| T-30m | Final go/no-go with guild leads; confirm monitoring dashboards green. | Cutover lead |
|
||||
| T0 | Execute production cutover steps (Section 4). | Cutover team |
|
||||
| T+45m | Smoke tests complete (Section 5); announce success or trigger rollback. | Cutover lead |
|
||||
| T+4h | Post-cutover metrics review, notify stakeholders, close ticket. | DevOps + product owner |
|
||||
|
||||
## 3. Rehearsal (Staging) Checklist
|
||||
|
||||
1. `docker network create stellaops_frontdoor || true` (if not present on staging jump host).
|
||||
2. Run `deploy/tools/validate-profiles.sh` and archive output.
|
||||
3. Apply staging secrets (`kubectl apply -f secrets/stage/*.yaml` or `helm secrets upgrade`) ensuring `stellaops-stage` credentials align with `values-stage.yaml`.
|
||||
4. Perform `helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-stage.yaml` in staging cluster.
|
||||
5. Verify health endpoints: `curl https://authority.stage.../healthz`, `curl https://scanner.stage.../healthz`.
|
||||
6. Execute smoke CLI: `stellaops-cli scan submit --profile staging --sbom samples/sbom/demo.json` and confirm report status in UI.
|
||||
7. Document total wall time and any deviations in the rehearsal log.
|
||||
|
||||
Rehearsal must complete without manual interventions before proceeding to production.
|
||||
|
||||
## 4. Production Cutover Steps
|
||||
|
||||
### 4.1 Pre-flight
|
||||
- Confirm production secrets in the appropriate secret store (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`) contain the keys referenced in `values-prod.yaml`.
|
||||
- Ensure the external reverse proxy network exists: `docker network create stellaops_frontdoor || true` on each compose host.
|
||||
- Back up current configuration and data:
|
||||
- Mongo snapshot: `mongodump --uri "$MONGO_BACKUP_URI" --out /backups/launch-$(date -Iseconds)`.
|
||||
- MinIO policy export: `mc mirror --overwrite minio/stellaops minio-backup/stellaops-$(date +%Y%m%d%H%M)`.
|
||||
|
||||
### 4.2 Apply Updates (Compose)
|
||||
1. On each compose node, pull updated images for release `2025.09.2`:
|
||||
```bash
|
||||
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml pull
|
||||
```
|
||||
2. Deploy changes:
|
||||
```bash
|
||||
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml up -d
|
||||
```
|
||||
3. Confirm containers healthy via `docker compose ps` and `docker logs <service> --tail 50`.
|
||||
|
||||
### 4.3 Apply Updates (Helm/Kubernetes)
|
||||
If using Kubernetes, perform:
|
||||
```bash
|
||||
helm upgrade stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-prod.yaml --atomic --timeout 15m
|
||||
```
|
||||
Monitor rollout with `kubectl get pods -n stellaops --watch` and `kubectl rollout status deployment/<service>`.
|
||||
|
||||
### 4.4 Configuration Validation
|
||||
- Verify Authority issuer metadata: `curl https://authority.prod.../.well-known/openid-configuration`.
|
||||
- Validate Signer DSSE endpoint: `stellaops-cli signer verify --base-url https://signer.prod... --bundle samples/dsse/demo.json`.
|
||||
- Check Scanner queue connectivity: `docker exec stellaops-scanner-web dotnet StellaOps.Scanner.WebService.dll health queue` (returns success).
|
||||
- Ensure Notify (legacy) still accessible while Notifier migration pending.
|
||||
|
||||
## 5. Smoke Tests
|
||||
|
||||
| Test | Command / Action | Expected Result |
|
||||
| --- | --- | --- |
|
||||
| API health | `curl https://scanner.prod.../healthz` | HTTP 200 with `status":"Healthy"` |
|
||||
| Scan submit | `stellaops-cli scan submit --profile prod --sbom samples/sbom/demo.json` | Scan completes < 5 minutes; report accessible with signed DSSE |
|
||||
| Runtime event ingest | Post sample event from Zastava observer fixture | `/runtime/events` responds 202 Accepted; record visible in Mongo `runtime_events` |
|
||||
| Signing | `stellaops-cli signer sign --bundle demo.json` | Returns DSSE with matching SHA256 and signer metadata |
|
||||
| Attestor verify | `stellaops-cli attestor verify --uuid <uuid>` | Verification result `ok=true` |
|
||||
| Web UI | Manual login, verify dashboards render and latency within budget | UI loads under 2 seconds; policy views consistent |
|
||||
|
||||
Log results in the change ticket with timestamps and screenshots where applicable.
|
||||
|
||||
## 6. Rollback Procedure
|
||||
|
||||
1. Assess failure scope; if systemic, initiate rollback immediately while preserving logs/artifacts.
|
||||
2. For Compose:
|
||||
```bash
|
||||
docker compose --env-file prod.env -f deploy/compose/docker-compose.prod.yaml down
|
||||
docker compose --env-file stage.env -f deploy/compose/docker-compose.stage.yaml up -d
|
||||
```
|
||||
3. For Helm:
|
||||
```bash
|
||||
helm rollback stellaops <previous-release-number> --namespace stellaops
|
||||
```
|
||||
4. Restore Mongo snapshot if data inconsistency detected: `mongorestore --uri "$MONGO_BACKUP_URI" --drop /backups/launch-<timestamp>`.
|
||||
5. Restore MinIO mirror if required: `mc mirror minio-backup/stellaops-<timestamp> minio/stellaops`.
|
||||
6. Notify stakeholders of rollback and capture root cause notes in incident ticket.
|
||||
|
||||
## 7. Post-cutover Actions
|
||||
|
||||
- Keep heightened monitoring for 4 hours post cutover; track latency, error rates, and queue depth.
|
||||
- Confirm audit trails: Authority tokens issued, Scanner events recorded, Attestor submissions stored.
|
||||
- Update `docs/ops/launch-readiness.md` if any new gaps or follow-ups discovered.
|
||||
- Schedule retrospective within 48 hours; include DevOps, module guilds, and product owner.
|
||||
|
||||
## 8. Approval Matrix
|
||||
|
||||
| Step | Required Approvers | Record Location |
|
||||
| --- | --- | --- |
|
||||
| Production deployment plan | CTO + DevOps lead | Change ticket comment |
|
||||
| Cutover start (T0) | DevOps lead + module reps | `#launch-bridge` summary |
|
||||
| Post-smoke success | DevOps lead + product owner | Change ticket closure |
|
||||
| Rollback (if invoked) | DevOps lead + CTO | Incident ticket |
|
||||
|
||||
Retain all approvals and logs for audit. Update this runbook after each execution to record actual timings and lessons learned.
|
||||
|
||||
## 9. Rehearsal Log
|
||||
|
||||
| Date (UTC) | What We Exercised | Outcome | Follow-up |
|
||||
| --- | --- | --- | --- |
|
||||
| 2025-10-26 | Dry-run of compose/Helm validation via `deploy/tools/validate-profiles.sh` (dev/stage/prod/airgap/mirror). Network creation simulated (`docker network create stellaops_frontdoor` planned) and stage CLI submission reviewed. | Validation script succeeded; all profiles templated cleanly. Stage deployment apply deferred because no staging cluster is accessible from the current environment. | Schedule full stage rehearsal once staging cluster credentials are available; reuse this log section to capture timings. |
|
||||
49
docs/ops/launch-readiness.md
Normal file
49
docs/ops/launch-readiness.md
Normal file
@@ -0,0 +1,49 @@
|
||||
# Launch Readiness Record - Stella Ops
|
||||
|
||||
_Updated: 2025-10-26 (UTC)_
|
||||
|
||||
This document captures production launch sign-offs, deployment readiness checkpoints, and any open risks that must be tracked before GA cutover.
|
||||
|
||||
## 1. Sign-off Summary
|
||||
|
||||
| Module / Service | Guild / Point of Contact | Evidence (Task or Runbook) | Status | Timestamp (UTC) | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| Authority (Issuer) | Authority Core Guild | `AUTH-AOC-19-001` - scope issuance & configuration complete (DONE 2025-10-26) | READY | 2025-10-26T14:05Z | Tenant scope propagation follow-up (`AUTH-AOC-19-002`) tracked in gaps section. |
|
||||
| Signer | Signer Guild | `SIGNER-API-11-101` / `SIGNER-REF-11-102` / `SIGNER-QUOTA-11-103` (DONE 2025-10-21) | READY | 2025-10-26T14:07Z | DSSE signing, referrer verification, and quota enforcement validated in CI. |
|
||||
| Attestor | Attestor Guild | `ATTESTOR-API-11-201` / `ATTESTOR-VERIFY-11-202` / `ATTESTOR-OBS-11-203` (DONE 2025-10-19) | READY | 2025-10-26T14:10Z | Rekor submission/verification pipeline green; telemetry pack published. |
|
||||
| Scanner Web + Worker | Scanner WebService Guild | `SCANNER-WEB-09-10x`, `SCANNER-RUNTIME-12-30x` (DONE 2025-10-18 -> 2025-10-24) | READY* | 2025-10-26T14:20Z | Orchestrator envelope work (`SCANNER-EVENTS-16-301/302`) still open; see gaps. |
|
||||
| Concelier Core & Connectors | Concelier Core / Ops Guild | Ops runbook sign-off in `docs/ops/concelier-conflict-resolution.md` (2025-10-16) | READY | 2025-10-26T14:25Z | Conflict resolution & connector coverage accepted; Mongo schema hardening pending (see gaps). |
|
||||
| Excititor API | Excititor Core Guild | Wave 0 connector ingest sign-offs (EXECPLAN.Section Wave 0) | READY | 2025-10-26T14:28Z | VEX linkset publishing complete for launch datasets. |
|
||||
| Notify Web (legacy) | Notify Guild | Existing stack carried forward; Notifier program tracked separately (Sprint 38-40) | PENDING | 2025-10-26T14:32Z | Legacy notify web remains operational; migration to Notifier blocked on `SCANNER-EVENTS-16-301`. |
|
||||
| Web UI | UI Guild | Stable build `registry.stella-ops.org/.../web-ui@sha256:10d9248...` deployed in stage and smoke-tested | READY | 2025-10-26T14:35Z | Policy editor GA items (Sprint 20) outside launch scope. |
|
||||
| DevOps / Release | DevOps Guild | `deploy/tools/validate-profiles.sh` run (2025-10-26) covering dev/stage/prod/airgap/mirror | READY | 2025-10-26T15:02Z | Compose/Helm lint + docker compose config validated; see Section 2 for details. |
|
||||
| Offline Kit | Offline Kit Guild | `DEVOPS-OFFLINE-18-004` (Go analyzer) and `DEVOPS-OFFLINE-18-005` (Python analyzer) complete; debug-store mirror pending (`DEVOPS-OFFLINE-17-004`). | PENDING | 2025-10-26T15:05Z | Awaiting release debug artefacts to finalise `DEVOPS-OFFLINE-17-004`; tracked in Section 3. |
|
||||
|
||||
_\* READY with caveat - remaining work noted in Section 3._
|
||||
|
||||
## 2. Deployment Readiness Checklist
|
||||
|
||||
- **Production profiles committed:** `deploy/compose/docker-compose.prod.yaml` and `deploy/helm/stellaops/values-prod.yaml` added with front-door network hand-off and secret references for Mongo/MinIO/core services.
|
||||
- **Secrets placeholders documented:** `deploy/compose/env/prod.env.example` enumerates required credentials (`MONGO_INITDB_ROOT_PASSWORD`, `MINIO_ROOT_PASSWORD`, Redis/NATS endpoints, `FRONTDOOR_NETWORK`). Helm values reference Kubernetes secrets (`stellaops-prod-core`, `stellaops-prod-mongo`, `stellaops-prod-minio`, `stellaops-prod-notify`).
|
||||
- **Static validation executed:** `deploy/tools/validate-profiles.sh` run on 2025-10-26 (docker compose config + helm lint/template) with all profiles passing.
|
||||
- **Ingress model defined:** Production compose profile introduces external `frontdoor` network; README updated with creation instructions and scope of externally reachable services.
|
||||
- **Observability hooks:** Authority/Signer/Attestor telemetry packs verified; scanner runtime build-id metrics landed (`SCANNER-RUNTIME-17-401`). Grafana dashboards referenced in component runbooks.
|
||||
- **Rollback assets:** Stage Compose profile remains aligned (`docker-compose.stage.yaml`), enabling rehearsals before prod cutover; release manifests (`deploy/releases/2025.09-stable.yaml`) map digests for reproducible rollback.
|
||||
- **Rehearsal status:** 2025-10-26 validation dry-run executed (`deploy/tools/validate-profiles.sh` across dev/stage/prod/airgap/mirror). Full stage Helm rollout pending access to the managed staging cluster; target to complete once credentials are provisioned.
|
||||
|
||||
## 3. Outstanding Gaps & Follow-ups
|
||||
|
||||
| Item | Owner | Tracking Ref | Target / Next Step | Impact |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Tenant scope propagation and audit coverage | Authority Core Guild | `AUTH-AOC-19-002` (DOING 2025-10-26) | Land enforcement + audit fixtures by Sprint 19 freeze | Medium - required for multi-tenant GA but does not block initial cutover if tenants scoped manually. |
|
||||
| Orchestrator event envelopes + Notifier handshake | Scanner WebService Guild | `SCANNER-EVENTS-16-301` (BLOCKED), `SCANNER-EVENTS-16-302` (DOING) | Coordinate with Gateway/Notifier owners on preview package replacement or binding redirects; rerun `dotnet test` once patch lands and refresh schema docs. Share envelope samples in `docs/events/` after tests pass. | High — gating Notifier migration; legacy notify path remains functional meanwhile. |
|
||||
| Offline Kit Python analyzer bundle | Offline Kit Guild + Scanner Guild | `DEVOPS-OFFLINE-18-005` (DONE 2025-10-26) | Monitor for follow-up manifest updates and rerun smoke script when analyzers change. | Medium - ensures language analyzer coverage stays current for offline installs. |
|
||||
| Offline Kit debug store mirror | Offline Kit Guild + DevOps Guild | `DEVOPS-OFFLINE-17-004` (BLOCKED 2025-10-26) | Release pipeline must publish `out/release/debug` artefacts; once available, run `mirror_debug_store.py` and commit `metadata/debug-store.json`. | Low - symbol lookup remains accessible from staging assets but required before next Offline Kit tag. |
|
||||
| Mongo schema validators for advisory ingestion | Concelier Storage Guild | `CONCELIER-STORE-AOC-19-001` (TODO) | Finalize JSON schema + migration toggles; coordinate with Ops for rollout window | Low - current validation handled in app layer; schema guard adds defense-in-depth. |
|
||||
| Authority plugin telemetry alignment | Security Guild | `SEC2.PLG`, `SEC3.PLG`, `SEC5.PLG` (BLOCKED pending AUTH DPoP/MTLS tasks) | Resume once upstream auth surfacing stabilises | Low - plugin remains optional; launch uses default Authority configuration. |
|
||||
|
||||
## 4. Approvals & Distribution
|
||||
|
||||
- Record shared in `#launch-readiness` (Mattermost) 2025-10-26 15:15 UTC with DevOps + Guild leads for acknowledgement.
|
||||
- Updates to this document require dual sign-off from DevOps Guild (owner) and impacted module guild lead; retain change log via Git history.
|
||||
- Cutover rehearsal and rollback drills are tracked separately in `docs/ops/launch-cutover.md` (see associated Task `DEVOPS-LAUNCH-18-001`). *** End Patch
|
||||
@@ -1,6 +1,6 @@
|
||||
# NuGet Preview Bootstrap (Offline-Friendly)
|
||||
|
||||
The StellaOps build relies on .NET 10 preview packages (Microsoft.Extensions.*, JwtBearer 10.0 RC).
|
||||
The StellaOps build relies on .NET 10 RC2 packages (Microsoft.Extensions.*, JwtBearer 10.0 RC).
|
||||
`NuGet.config` now wires three sources:
|
||||
|
||||
1. `local` → `./local-nuget` (preferred, air-gapped mirror)
|
||||
@@ -32,6 +32,21 @@ DOTNET_NOLOGO=1 dotnet restore src/StellaOps.Excititor.Connectors.Abstractions/S
|
||||
|
||||
The `packageSourceMapping` section keeps `Microsoft.Extensions.*`, `Microsoft.AspNetCore.*`, and `Microsoft.Data.Sqlite` bound to `local`/`dotnet-public`, so `dotnet restore` never has to reach out to nuget.org when mirrors are populated.
|
||||
|
||||
Before committing changes (or when wiring up a new environment) run:
|
||||
|
||||
```bash
|
||||
python3 ops/devops/validate_restore_sources.py
|
||||
```
|
||||
|
||||
The validator asserts:
|
||||
|
||||
- `NuGet.config` lists `local` → `dotnet-public` → `nuget.org` in that order.
|
||||
- `Directory.Build.props` pins `RestoreSources` so every project prioritises the local mirror.
|
||||
- No stray `NuGet.config` files shadow the repo root configuration.
|
||||
|
||||
CI executes the validator in both the `build-test-deploy` and `release` workflows,
|
||||
so regressions trip before any restore/build begins.
|
||||
|
||||
If you run fully air-gapped, remember to clear the cache between SDK upgrades:
|
||||
|
||||
```bash
|
||||
|
||||
66
docs/ops/registry-token-service.md
Normal file
66
docs/ops/registry-token-service.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Registry Token Service Operations
|
||||
|
||||
_Component_: `src/StellaOps.Registry.TokenService`
|
||||
|
||||
The registry token service issues short-lived Docker registry bearer tokens after
|
||||
validating an Authority OpTok (DPoP/mTLS sender constraint) and the customer’s
|
||||
plan entitlements. It is fronted by the Docker registry’s `Bearer realm` flow.
|
||||
|
||||
## Configuration
|
||||
|
||||
Configuration lives in `etc/registry-token.yaml` and can be overridden through
|
||||
environment variables prefixed with `REGISTRY_TOKEN_`. Key sections:
|
||||
|
||||
| Section | Purpose |
|
||||
| ------- | ------- |
|
||||
| `authority` | Authority issuer/metadata URL, audience list, and scopes required to request tokens (default `registry.token.issue`). |
|
||||
| `signing` | JWT issuer, signing key (PEM or PFX), optional key ID, and token lifetime (default five minutes). The repository ships **`etc/registry-signing-sample.pem`** for local testing only – replace it with a private key generated and stored per-environment before going live. |
|
||||
| `registry` | Registry realm URL and optional allow-list of `service` values accepted from the registry challenge. |
|
||||
| `plans` | Plan catalogue mapping plan name → repository patterns and allowed actions. Wildcards (`*`) are supported per path segment. |
|
||||
| `defaultPlan` | Applied when the caller’s token omits `stellaops:plan`. |
|
||||
| `revokedLicenses` | Blocks issuance when the caller presents a matching `stellaops:license` claim. |
|
||||
|
||||
Plan entries must cover every private repository namespace. Actions default to
|
||||
`pull` if omitted.
|
||||
|
||||
## Request flow
|
||||
|
||||
1. Docker/OCI client contacts the registry and receives a `401` with
|
||||
`WWW-Authenticate: Bearer realm=...,service=...,scope=repository:...`.
|
||||
2. Client acquires an OpTok from Authority (DPoP/mTLS bound) with the
|
||||
`registry.token.issue` scope.
|
||||
3. Client calls `GET /token?service=<service>&scope=repository:<name>:<actions>`
|
||||
against the token service, presenting the OpTok and matching DPoP proof.
|
||||
4. The service validates the token, plan, and requested scopes, then issues a
|
||||
JWT containing an `access` claim conforming to the Docker registry spec.
|
||||
|
||||
All denial paths return RFC 6750-style problem responses (HTTP 400 for malformed
|
||||
scopes, 403 for plan or revocation failures).
|
||||
|
||||
## Monitoring
|
||||
|
||||
The service emits OpenTelemetry metrics via `registry_token_issued_total` and
|
||||
`registry_token_rejected_total`. Suggested Prometheus alerts:
|
||||
|
||||
| Metric | Condition | Action |
|
||||
|--------|-----------|--------|
|
||||
| `registry_token_rejected_total` | `increase(...) > 0` over 5 minutes | Investigate plan misconfiguration or licence revocation. |
|
||||
| `registry_token_issued_total` | Sudden drop compared to baseline | Confirm registry is still challenging with the expected realm/service. |
|
||||
|
||||
Enable the built-in `/healthz` endpoint for liveness checks. Authentication and
|
||||
DPoP failures surface via the service logs (Serilog console output).
|
||||
|
||||
## Sample deployment
|
||||
|
||||
```bash
|
||||
dotnet run --project src/StellaOps.Registry.TokenService \
|
||||
--urls "http://0.0.0.0:8085"
|
||||
|
||||
curl -H "Authorization: Bearer <OpTok>" \
|
||||
-H "DPoP: $(dpop-proof ...)" \
|
||||
"http://localhost:8085/token?service=registry.localhost&scope=repository:stella-ops/public/base:pull"
|
||||
```
|
||||
|
||||
Replace `<OpTok>` and `DPoP` with tokens issued by Authority. The response
|
||||
contains `token`, `expires_in`, and `issued_at` fields suitable for Docker/OCI
|
||||
clients.
|
||||
113
docs/ops/telemetry-collector.md
Normal file
113
docs/ops/telemetry-collector.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Telemetry Collector Deployment Guide
|
||||
|
||||
> **Scope:** DevOps Guild, Observability Guild, and operators enabling the StellaOps telemetry pipeline (DEVOPS-OBS-50-001 / DEVOPS-OBS-50-003).
|
||||
|
||||
This guide describes how to deploy the default OpenTelemetry Collector packaged with Stella Ops, validate its ingest endpoints, and prepare an offline-ready bundle for air-gapped environments.
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
The collector terminates OTLP traffic from Stella Ops services and exports metrics, traces, and logs.
|
||||
|
||||
| Endpoint | Purpose | TLS | Authentication |
|
||||
| -------- | ------- | --- | -------------- |
|
||||
| `:4317` | OTLP gRPC ingest | mTLS | Client certificate issued by collector CA |
|
||||
| `:4318` | OTLP HTTP ingest | mTLS | Client certificate issued by collector CA |
|
||||
| `:9464` | Prometheus scrape | mTLS | Same client certificate |
|
||||
| `:13133` | Health check | mTLS | Same client certificate |
|
||||
| `:1777` | pprof diagnostics | mTLS | Same client certificate |
|
||||
|
||||
The default configuration lives at `deploy/telemetry/otel-collector-config.yaml` and mirrors the Helm values in the `stellaops` chart.
|
||||
|
||||
---
|
||||
|
||||
## 2. Local validation (Compose)
|
||||
|
||||
```bash
|
||||
# 1. Generate dev certificates (CA + collector + client)
|
||||
./ops/devops/telemetry/generate_dev_tls.sh
|
||||
|
||||
# 2. Start the collector overlay
|
||||
cd deploy/compose
|
||||
docker compose -f docker-compose.telemetry.yaml up -d
|
||||
|
||||
# 3. Start the storage overlay (Prometheus, Tempo, Loki)
|
||||
docker compose -f docker-compose.telemetry-storage.yaml up -d
|
||||
|
||||
# 4. Run the smoke test (OTLP HTTP)
|
||||
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
|
||||
```
|
||||
|
||||
The smoke test posts sample traces, metrics, and logs and verifies that the collector increments the `otelcol_receiver_accepted_*` counters exposed via the Prometheus exporter. The storage overlay gives you a local Prometheus/Tempo/Loki stack to confirm end-to-end wiring. The same client certificate can be used by local services to weave traces together. See [`Telemetry Storage Deployment`](telemetry-storage.md) for the storage configuration guidelines used in staging/production.
|
||||
|
||||
---
|
||||
|
||||
## 3. Kubernetes deployment
|
||||
|
||||
Enable the collector in Helm by setting the following values (example shown for the dev profile):
|
||||
|
||||
```yaml
|
||||
telemetry:
|
||||
collector:
|
||||
enabled: true
|
||||
defaultTenant: <tenant>
|
||||
tls:
|
||||
secretName: stellaops-otel-tls-<env>
|
||||
```
|
||||
|
||||
Provide a Kubernetes secret named `stellaops-otel-tls-<env>` (for staging: `stellaops-otel-tls-stage`) with the keys `tls.crt`, `tls.key`, and `ca.crt`. The secret must contain the collector certificate, private key, and issuing CA respectively. Example:
|
||||
|
||||
```bash
|
||||
kubectl create secret generic stellaops-otel-tls-stage \
|
||||
--from-file=tls.crt=collector.crt \
|
||||
--from-file=tls.key=collector.key \
|
||||
--from-file=ca.crt=ca.crt
|
||||
```
|
||||
|
||||
Helm renders the collector deployment, service, and config map automatically:
|
||||
|
||||
```bash
|
||||
helm upgrade --install stellaops deploy/helm/stellaops -f deploy/helm/stellaops/values-dev.yaml
|
||||
```
|
||||
|
||||
Update client workloads to trust `ca.crt` and present client certificates that chain back to the same CA.
|
||||
|
||||
---
|
||||
|
||||
## 4. Offline packaging (DEVOPS-OBS-50-003)
|
||||
|
||||
Use the packaging helper to produce a tarball that can be mirrored inside the Offline Kit or air-gapped sites:
|
||||
|
||||
```bash
|
||||
python ops/devops/telemetry/package_offline_bundle.py --output out/telemetry/telemetry-bundle.tar.gz
|
||||
```
|
||||
|
||||
The script gathers:
|
||||
|
||||
- `deploy/telemetry/README.md`
|
||||
- Collector configuration (`deploy/telemetry/otel-collector-config.yaml` and Helm copy)
|
||||
- Helm template/values for the collector
|
||||
- Compose overlay (`deploy/compose/docker-compose.telemetry.yaml`)
|
||||
|
||||
The tarball ships with a `.sha256` checksum. To attach a Cosign signature, add `--sign` and provide `COSIGN_KEY_REF`/`COSIGN_IDENTITY_TOKEN` env vars (or use the `--cosign-key` flag).
|
||||
|
||||
Distribute the bundle alongside certificates generated by your PKI. For air-gapped installs, regenerate certificates inside the enclave and recreate the `stellaops-otel-tls` secret.
|
||||
|
||||
---
|
||||
|
||||
## 5. Operational checks
|
||||
|
||||
1. **Health probes** – `kubectl exec` into the collector pod and run `curl -fsSk --cert client.crt --key client.key --cacert ca.crt https://127.0.0.1:13133/healthz`.
|
||||
2. **Metrics scrape** – confirm Prometheus ingests `otelcol_receiver_accepted_*` counters.
|
||||
3. **Trace correlation** – ensure services propagate `trace_id` and `tenant.id` attributes; refer to `docs/observability/observability.md` for expected spans.
|
||||
4. **Certificate rotation** – when rotating the CA, update the secret and restart the collector; roll out new client certificates before enabling `require_client_certificate` if staged.
|
||||
|
||||
---
|
||||
|
||||
## 6. Related references
|
||||
|
||||
- `deploy/telemetry/README.md` – source configuration and local workflow.
|
||||
- `ops/devops/telemetry/smoke_otel_collector.py` – OTLP smoke test.
|
||||
- `docs/observability/observability.md` – metrics/traces/logs taxonomy.
|
||||
- `docs/13_RELEASE_ENGINEERING_PLAYBOOK.md` – release checklist for telemetry assets.
|
||||
172
docs/ops/telemetry-storage.md
Normal file
172
docs/ops/telemetry-storage.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# Telemetry Storage Deployment (DEVOPS-OBS-50-002)
|
||||
|
||||
> **Audience:** DevOps Guild, Observability Guild
|
||||
>
|
||||
> **Scope:** Prometheus (metrics), Tempo (traces), Loki (logs) storage backends with tenant isolation, TLS, retention policies, and Authority integration.
|
||||
|
||||
---
|
||||
|
||||
## 1. Components & Ports
|
||||
|
||||
| Service | Port | Purpose | TLS |
|
||||
|-----------|------|---------|-----|
|
||||
| Prometheus | 9090 | Metrics API / alerting | Client auth (mTLS) to scrape collector |
|
||||
| Tempo | 3200 | Trace ingest + API | mTLS (client cert required) |
|
||||
| Loki | 3100 | Log ingest + API | mTLS (client cert required) |
|
||||
|
||||
The collector forwards OTLP traffic to Tempo (traces), Prometheus scrapes the collector’s `/metrics` endpoint, and Loki is used for log search.
|
||||
|
||||
---
|
||||
|
||||
## 2. Local validation (Compose)
|
||||
|
||||
```bash
|
||||
./ops/devops/telemetry/generate_dev_tls.sh
|
||||
cd deploy/compose
|
||||
# Start collector + storage stack
|
||||
docker compose -f docker-compose.telemetry.yaml up -d
|
||||
docker compose -f docker-compose.telemetry-storage.yaml up -d
|
||||
python ../../ops/devops/telemetry/smoke_otel_collector.py --host localhost
|
||||
```
|
||||
|
||||
Configuration files live in `deploy/telemetry/storage/`. Adjust the overrides before shipping to staging/production.
|
||||
|
||||
---
|
||||
|
||||
## 3. Kubernetes blueprint
|
||||
|
||||
Deploy Prometheus, Tempo, and Loki to the `observability` namespace. The Helm values snippet below illustrates the key settings (charts not yet versioned—define them in the observability repo):
|
||||
|
||||
```yaml
|
||||
prometheus:
|
||||
server:
|
||||
extraFlags:
|
||||
- web.enable-lifecycle
|
||||
persistentVolume:
|
||||
enabled: true
|
||||
size: 200Gi
|
||||
additionalScrapeConfigsSecret: stellaops-prometheus-scrape
|
||||
extraSecretMounts:
|
||||
- name: otel-mtls
|
||||
secretName: stellaops-otel-tls-stage
|
||||
mountPath: /etc/telemetry/tls
|
||||
readOnly: true
|
||||
- name: otel-token
|
||||
secretName: stellaops-prometheus-token
|
||||
mountPath: /etc/telemetry/auth
|
||||
readOnly: true
|
||||
|
||||
loki:
|
||||
auth_enabled: true
|
||||
singleBinary:
|
||||
replicas: 2
|
||||
storage:
|
||||
type: filesystem
|
||||
existingSecretForTls: stellaops-otel-tls-stage
|
||||
runtimeConfig:
|
||||
configMap:
|
||||
name: stellaops-loki-tenant-overrides
|
||||
|
||||
tempo:
|
||||
server:
|
||||
http_listen_port: 3200
|
||||
storage:
|
||||
trace:
|
||||
backend: s3
|
||||
s3:
|
||||
endpoint: tempo-minio.observability.svc:9000
|
||||
bucket: tempo-traces
|
||||
multitenancyEnabled: true
|
||||
extraVolumeMounts:
|
||||
- name: otel-mtls
|
||||
mountPath: /etc/telemetry/tls
|
||||
readOnly: true
|
||||
- name: tempo-tenant-overrides
|
||||
mountPath: /etc/telemetry/tenants
|
||||
readOnly: true
|
||||
```
|
||||
|
||||
### Staging bootstrap commands
|
||||
|
||||
```bash
|
||||
kubectl create namespace observability --dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# TLS material (generated via ops/devops/telemetry/generate_dev_tls.sh or from PKI)
|
||||
kubectl -n observability create secret generic stellaops-otel-tls-stage \
|
||||
--from-file=tls.crt=collector-stage.crt \
|
||||
--from-file=tls.key=collector-stage.key \
|
||||
--from-file=ca.crt=collector-ca.crt
|
||||
|
||||
# Prometheus bearer token issued by Authority (scope obs:read)
|
||||
kubectl -n observability create secret generic stellaops-prometheus-token \
|
||||
--from-file=token=prometheus-stage.token
|
||||
|
||||
# Tenant overrides
|
||||
kubectl -n observability create configmap stellaops-loki-tenant-overrides \
|
||||
--from-file=overrides.yaml=deploy/telemetry/storage/tenants/loki-overrides.yaml
|
||||
|
||||
kubectl -n observability create configmap tempo-tenant-overrides \
|
||||
--from-file=tempo-overrides.yaml=deploy/telemetry/storage/tenants/tempo-overrides.yaml
|
||||
|
||||
# Additional scrape config referencing the collector service
|
||||
kubectl -n observability create secret generic stellaops-prometheus-scrape \
|
||||
--from-file=prometheus-additional.yaml=deploy/telemetry/storage/prometheus.yaml
|
||||
```
|
||||
|
||||
Provision the following secrets/configs (names can be overridden via Helm values):
|
||||
|
||||
| Name | Type | Notes |
|
||||
|------|------|-------|
|
||||
| `stellaops-otel-tls-stage` | Secret | Shared CA + server cert/key for collector/storage mTLS.
|
||||
| `stellaops-prometheus-token` | Secret | Bearer token minted by Authority (`obs:read`).
|
||||
| `stellaops-loki-tenant-overrides` | ConfigMap | Text from `deploy/telemetry/storage/tenants/loki-overrides.yaml`.
|
||||
| `tempo-tenant-overrides` | ConfigMap | Text from `deploy/telemetry/storage/tenants/tempo-overrides.yaml`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Authority & tenancy integration
|
||||
|
||||
1. Create Authority clients for each backend (`observability-prometheus`, `observability-loki`, `observability-tempo`).
|
||||
```bash
|
||||
stella authority client create observability-prometheus \
|
||||
--scopes obs:read \
|
||||
--audience observability --description "Prometheus collector scrape"
|
||||
stella authority client create observability-loki \
|
||||
--scopes obs:logs timeline:read \
|
||||
--audience observability --description "Loki ingestion"
|
||||
stella authority client create observability-tempo \
|
||||
--scopes obs:traces \
|
||||
--audience observability --description "Tempo ingestion"
|
||||
```
|
||||
2. Mint tokens/credentials and store them in the secrets above (see staging bootstrap commands). Example:
|
||||
```bash
|
||||
stella authority token issue observability-prometheus --ttl 30d > prometheus-stage.token
|
||||
```
|
||||
3. Update ingress/gateway policies to forward `X-StellaOps-Tenant` into Loki/Tempo so tenant headers propagate end-to-end, and ensure each workload sets `tenant.id` attributes (see `docs/observability/observability.md`).
|
||||
|
||||
---
|
||||
|
||||
## 5. Retention & isolation
|
||||
|
||||
- Adjust `deploy/telemetry/storage/tenants/*.yaml` to set per-tenant retention and ingestion limits.
|
||||
- Configure object storage (S3, GCS, Azure Blob) when moving beyond filesystem storage.
|
||||
- For air-gapped deployments, mirror the telemetry bundle using `ops/devops/telemetry/package_offline_bundle.py` and import inside the Offline Kit staging directory.
|
||||
|
||||
---
|
||||
|
||||
## 6. Operational checklist
|
||||
|
||||
- [ ] Certificates rotated and secrets updated.
|
||||
- [ ] Prometheus scrape succeeds (`curl -sk --cert client.crt --key client.key https://collector:9464`).
|
||||
- [ ] Tempo and Loki report tenant activity (`/api/status`).
|
||||
- [ ] Retention policy tested by uploading sample data and verifying expiry.
|
||||
- [ ] Alerts wired into SLO evaluator (DEVOPS-OBS-51-001).
|
||||
|
||||
---
|
||||
|
||||
## 7. References
|
||||
|
||||
- `deploy/telemetry/storage/README.md`
|
||||
- `deploy/compose/docker-compose.telemetry-storage.yaml`
|
||||
- `docs/ops/telemetry-collector.md`
|
||||
- `docs/observability/observability.md`
|
||||
@@ -137,19 +137,33 @@ Runtime events emitted by Observer now include `process.buildId` (from the ELF
|
||||
`buildIds` list per digest. Operators can use these hashes to locate debug
|
||||
artifacts during incident response:
|
||||
|
||||
1. Capture the hash from CLI/webhook/Scanner API (example:
|
||||
1. Capture the hash from CLI/webhook/Scanner API—for example:
|
||||
```bash
|
||||
stellaops-cli runtime policy test --image <digest> --namespace <ns>
|
||||
```
|
||||
Copy one of the `Build IDs` (e.g.
|
||||
`5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789`).
|
||||
2. Derive the path: `<hash[0:2]>/<hash[2:]>` under the debug store, e.g.
|
||||
`/var/opt/debug/.build-id/5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug`.
|
||||
2. Derive the debug path (`<aa>/<rest>` under `.build-id`) and check it exists:
|
||||
```bash
|
||||
ls /var/opt/debug/.build-id/5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug
|
||||
```
|
||||
3. If the file is missing, rehydrate it from Offline Kit bundles or the
|
||||
`debug-store` object bucket (mirror of release artefacts). Use:
|
||||
```sh
|
||||
`debug-store` object bucket (mirror of release artefacts):
|
||||
```bash
|
||||
oras cp oci://registry.internal/debug-store:latest . --include \
|
||||
"5f/0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789.debug"
|
||||
```
|
||||
4. Attach the `.debug` file in `gdb`/`lldb` or feed it to `eu-unstrip` when
|
||||
preparing symbolized traces.
|
||||
5. For musl-based images, expect shorter build-id footprints. Missing hashes in
|
||||
4. Confirm the running process advertises the same GNU build-id before
|
||||
symbolising:
|
||||
```bash
|
||||
readelf -n /proc/$(pgrep -f payments-api | head -n1)/exe | grep -i 'Build ID'
|
||||
```
|
||||
5. Attach the `.debug` file in `gdb`/`lldb`, feed it to `eu-unstrip`, or cache it
|
||||
in `debuginfod` for fleet-wide symbol resolution:
|
||||
```bash
|
||||
debuginfod-find debuginfo 5f0c7c3cb4d9f8a4f1c1d5c6b7e8f90123456789 >/tmp/payments-api.debug
|
||||
```
|
||||
6. For musl-based images, expect shorter build-id footprints. Missing hashes in
|
||||
runtime events indicate stripped binaries without the GNU note—schedule a
|
||||
rebuild with `-Wl,--build-id` enabled or add the binary to the debug-store
|
||||
allowlist so the scanner can surface a fallback symbol package.
|
||||
|
||||
294
docs/policy/dsl.md
Normal file
294
docs/policy/dsl.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Stella Policy DSL (`stella-dsl@1`)
|
||||
|
||||
> **Audience:** Policy authors, reviewers, and tooling engineers building lint/compile flows for the Policy Engine v2 rollout (Sprint 20).
|
||||
|
||||
This document specifies the `stella-dsl@1` grammar, semantics, and guardrails used by Stella Ops to transform SBOM facts, Concelier advisories, and Excititor VEX statements into effective findings. Use it with the [Policy Engine Overview](overview.md) for architectural context and the upcoming lifecycle/run guides for operational workflows.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Design Goals
|
||||
|
||||
- **Deterministic:** Same policy + same inputs ⇒ identical findings on every machine.
|
||||
- **Declarative:** No arbitrary loops, network calls, or clock access.
|
||||
- **Explainable:** Every decision records the rule, inputs, and rationale in the explain trace.
|
||||
- **Lean authoring:** Common precedence, severity, and suppression patterns are first-class.
|
||||
- **Offline-friendly:** Grammar and built-ins avoid cloud dependencies, run the same in sealed deployments.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Document Structure
|
||||
|
||||
Policy packs ship one or more `.stella` files. Each file contains exactly one `policy` block:
|
||||
|
||||
```dsl
|
||||
policy "Default Org Policy" syntax "stella-dsl@1" {
|
||||
metadata {
|
||||
description = "Baseline severity + VEX precedence"
|
||||
tags = ["baseline","vex"]
|
||||
}
|
||||
|
||||
profile severity {
|
||||
map vendor_weight {
|
||||
source "GHSA" => +0.5
|
||||
source "OSV" => +0.0
|
||||
source "VendorX" => -0.2
|
||||
}
|
||||
env exposure_adjustments {
|
||||
if env.runtime == "serverless" then -0.5
|
||||
if env.exposure == "internal-only" then -1.0
|
||||
}
|
||||
}
|
||||
|
||||
rule vex_precedence priority 10 {
|
||||
when vex.any(status in ["not_affected","fixed"])
|
||||
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
|
||||
then status := vex.status
|
||||
because "Strong vendor justification prevails";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
High-level layout:
|
||||
|
||||
| Section | Purpose |
|
||||
|---------|---------|
|
||||
| `metadata` | Optional descriptive fields surfaced in Console/CLI. |
|
||||
| `imports` | Reserved for future reuse (not yet implemented in `@1`). |
|
||||
| `profile` blocks | Declarative scoring modifiers (`severity`, `trust`, `reachability`). |
|
||||
| `rule` blocks | When/then logic applied to each `(component, advisory, vex[])` tuple. |
|
||||
| `settings` | Optional evaluation toggles (sampling, default status overrides). |
|
||||
|
||||
---
|
||||
|
||||
## 3 · Lexical Rules
|
||||
|
||||
- **Case sensitivity:** Keywords are lowercase; identifiers are case-sensitive.
|
||||
- **Whitespace:** Space, tab, newline act as separators. Indentation is cosmetic.
|
||||
- **Comments:** `// inline` and `/* block */` are ignored.
|
||||
- **Literals:**
|
||||
- Strings use double quotes (`"text"`); escape with `\"`, `\n`, `\t`.
|
||||
- Numbers are decimal; suffix `%` allowed for percentage weights (`-2.5%` becomes `-0.025`).
|
||||
- Booleans: `true`, `false`.
|
||||
- Lists: `[1, 2, 3]`, `["a","b"]`.
|
||||
- **Identifiers:** Start with letter or underscore, continue with letters, digits, `_`.
|
||||
- **Operators:** `=`, `==`, `!=`, `<`, `<=`, `>`, `>=`, `in`, `not in`, `and`, `or`, `not`, `:=`.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Grammar (EBNF)
|
||||
|
||||
```ebnf
|
||||
policy = "policy", string, "syntax", string, "{", policy-body, "}" ;
|
||||
policy-body = { metadata | profile | settings | rule | helper } ;
|
||||
|
||||
metadata = "metadata", "{", { meta-entry }, "}" ;
|
||||
meta-entry = identifier, "=", (string | list) ;
|
||||
|
||||
profile = "profile", identifier, "{", { profile-item }, "}" ;
|
||||
profile-item= map | env-map | scalar ;
|
||||
map = "map", identifier, "{", { "source", string, "=>", number, ";" }, "}" ;
|
||||
env-map = "env", identifier, "{", { "if", expression, "then", number, ";" }, "}" ;
|
||||
scalar = identifier, "=", (number | string | list), ";" ;
|
||||
|
||||
settings = "settings", "{", { setting-entry }, "}" ;
|
||||
setting-entry = identifier, "=", (number | string | boolean), ";" ;
|
||||
|
||||
rule = "rule", identifier, [ "priority", integer ], "{",
|
||||
"when", predicate,
|
||||
{ "and", predicate },
|
||||
"then", { action },
|
||||
[ "else", { action } ],
|
||||
[ "because", string ],
|
||||
"}" ;
|
||||
|
||||
predicate = expression ;
|
||||
expression = term, { ("and" | "or"), term } ;
|
||||
term = ["not"], factor ;
|
||||
factor = comparison | membership | function-call | literal | identifier | "(" expression ")" ;
|
||||
comparison = value, comparator, value ;
|
||||
membership = value, ("in" | "not in"), list ;
|
||||
value = identifier | literal | function-call | field-access ;
|
||||
field-access= identifier, { ".", identifier | "[" literal "]" } ;
|
||||
function-call = identifier, "(", [ arg-list ], ")" ;
|
||||
arg-list = expression, { ",", expression } ;
|
||||
literal = string | number | boolean | list ;
|
||||
|
||||
action = assignment | ignore | escalate | require | warn | defer | annotate ;
|
||||
assignment = target, ":=", expression, ";" ;
|
||||
target = identifier, { ".", identifier } ;
|
||||
ignore = "ignore", [ "until", expression ], [ "because", string ], ";" ;
|
||||
escalate = "escalate", [ "to", expression ], [ "when", expression ], ";" ;
|
||||
require = "requireVex", "{", require-fields, "}", ";" ;
|
||||
warn = "warn", [ "message", string ], ";" ;
|
||||
defer = "defer", [ "until", expression ], ";" ;
|
||||
annotate = "annotate", identifier, ":=", expression, ";" ;
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `helper` is reserved for shared calculcations (not yet implemented in `@1`).
|
||||
- `else` branch executes only if `when` predicates evaluate truthy **and** no prior rule earlier in priority handled the tuple.
|
||||
- Semicolons inside rule bodies are optional when each clause is on its own line; the compiler emits canonical semicolons in IR.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Evaluation Context
|
||||
|
||||
Within predicates and actions you may reference the following namespaces:
|
||||
|
||||
| Namespace | Fields | Description |
|
||||
|-----------|--------|-------------|
|
||||
| `sbom` | `purl`, `name`, `version`, `licenses`, `layerDigest`, `tags`, `usedByEntrypoint` | Component metadata from Scanner. |
|
||||
| `advisory` | `id`, `source`, `aliases`, `severity`, `cvss`, `publishedAt`, `modifiedAt`, `content.raw` | Canonical Concelier advisory view. |
|
||||
| `vex` | `status`, `justification`, `statementId`, `timestamp`, `scope` | Current VEX statement when iterating; aggregator helpers available. |
|
||||
| `vex.any(...)`, `vex.all(...)`, `vex.count(...)` | Functions operating over all matching statements. |
|
||||
| `run` | `policyId`, `policyVersion`, `tenant`, `timestamp` | Metadata for explain annotations. |
|
||||
| `env` | Arbitrary key/value pairs injected per run (e.g., `environment`, `runtime`). |
|
||||
| `telemetry` | Optional reachability signals; missing fields evaluate to `unknown`. |
|
||||
| `profile.<name>` | Values computed inside profile blocks (maps, scalars). |
|
||||
|
||||
Missing fields evaluate to `null`, which is falsey in boolean context and propagates through comparisons unless explicitly checked.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Built-ins (v1)
|
||||
|
||||
| Function / Property | Signature | Description |
|
||||
|---------------------|-----------|-------------|
|
||||
| `normalize_cvss(advisory)` | `Advisory → SeverityScalar` | Parses `advisory.content.raw` for CVSS data; falls back to policy maps. |
|
||||
| `cvss(score, vector)` | `double × string → SeverityScalar` | Constructs a severity object manually. |
|
||||
| `severity_band(value)` | `string → SeverityBand` | Normalises strings like `"critical"`, `"medium"`. |
|
||||
| `risk_score(base, modifiers...)` | Variadic | Multiplies numeric modifiers (severity × trust × reachability). |
|
||||
| `vex.any(predicate)` | `(Statement → bool) → bool` | `true` if any statement satisfies predicate. |
|
||||
| `vex.all(predicate)` | `(Statement → bool) → bool` | `true` if all statements satisfy predicate. |
|
||||
| `vex.latest()` | `→ Statement` | Lexicographically newest statement. |
|
||||
| `advisory.has_tag(tag)` | `string → bool` | Checks advisory metadata tags. |
|
||||
| `advisory.matches(pattern)` | `string → bool` | Glob match against advisory identifiers. |
|
||||
| `sbom.has_tag(tag)` | `string → bool` | Uses SBOM inventory tags (usage vs inventory). |
|
||||
| `exists(expression)` | `→ bool` | `true` when value is non-null/empty. |
|
||||
| `coalesce(a, b, ...)` | `→ value` | First non-null argument. |
|
||||
| `days_between(dateA, dateB)` | `→ int` | Absolute day difference (UTC). |
|
||||
| `percent_of(part, whole)` | `→ double` | Fractions for scoring adjustments. |
|
||||
| `lowercase(text)` | `string → string` | Normalises casing deterministically (InvariantCulture). |
|
||||
|
||||
All built-ins are pure; if inputs are null the result is null unless otherwise noted.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Rule Semantics
|
||||
|
||||
1. **Ordering:** Rules execute in ascending `priority`. When priorities tie, lexical order defines precedence.
|
||||
2. **Short-circuit:** Once a rule sets `status`, subsequent rules only execute if they use `combine`. Use this sparingly to avoid ambiguity.
|
||||
3. **Actions:**
|
||||
- `status := <string>` – Allowed values: `affected`, `not_affected`, `fixed`, `suppressed`, `under_investigation`, `escalated`.
|
||||
- `severity := <SeverityScalar>` – Either from `normalize_cvss`, `cvss`, or numeric map; ensures `normalized` and `score`.
|
||||
- `ignore until <ISO-8601>` – Temporarily treats finding as suppressed until timestamp; recorded in explain trace.
|
||||
- `warn message "<text>"` – Adds warn verdict and deducts `warnPenalty`.
|
||||
- `escalate to severity_band("critical") when condition` – Forces verdict severity upward when condition true.
|
||||
- `requireVex { vendors = ["VendorX"], justifications = ["component_not_present"] }` – Fails evaluation if matching VEX evidence absent.
|
||||
- `annotate reason := "text"` – Adds free-form key/value pairs to explain payload.
|
||||
4. **Because clause:** Mandatory for actions changing status or severity; captured verbatim in explain traces.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Scoping Helpers
|
||||
|
||||
- **Maps:** Use `profile severity { map vendor_weight { ... } }` to declare additive factors. Retrieve with `profile.severity.vendor_weight["GHSA"]`.
|
||||
- **Environment overrides:** `env` profiles allow conditional adjustments based on runtime metadata.
|
||||
- **Tenancy:** `run.tenant` ensures policies remain tenant-aware; avoid hardcoding single-tenant IDs.
|
||||
- **Default values:** Use `settings { default_status = "affected"; }` to override built-in defaults.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Examples
|
||||
|
||||
### 9.1 Baseline Severity Normalisation
|
||||
|
||||
```dsl
|
||||
rule advisory_normalization {
|
||||
when advisory.source in ["GHSA","OSV"]
|
||||
then severity := normalize_cvss(advisory)
|
||||
because "Align vendor severity to CVSS baseline";
|
||||
}
|
||||
```
|
||||
|
||||
### 9.2 VEX Override with Quiet Mode
|
||||
|
||||
```dsl
|
||||
rule vex_strong_claim priority 5 {
|
||||
when vex.any(status == "not_affected")
|
||||
and vex.justification in ["component_not_present","vulnerable_code_not_present"]
|
||||
then status := vex.status
|
||||
annotate winning_statement := vex.latest().statementId
|
||||
warn message "VEX override applied"
|
||||
because "Strong VEX justification";
|
||||
}
|
||||
```
|
||||
|
||||
### 9.3 Environment-Specific Escalation
|
||||
|
||||
```dsl
|
||||
rule internet_exposed_guard {
|
||||
when env.exposure == "internet"
|
||||
and severity.normalized >= "High"
|
||||
then escalate to severity_band("Critical")
|
||||
because "Internet-exposed assets require critical posture";
|
||||
}
|
||||
```
|
||||
|
||||
### 9.4 Anti-pattern (flagged by linter)
|
||||
|
||||
```dsl
|
||||
rule catch_all {
|
||||
when true
|
||||
then status := "suppressed"
|
||||
because "Suppress everything" // ❌ Fails lint: unbounded suppression
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10 · Validation & Tooling
|
||||
|
||||
- `stella policy lint` ensures:
|
||||
- Grammar compliance and canonical formatting.
|
||||
- Static determinism guard (no forbidden namespaces).
|
||||
- Anti-pattern detection (e.g., unconditional suppression, missing `because`).
|
||||
- `stella policy compile` emits IR (`.stella.ir.json`) and SHA-256 digest used in `policy_runs`.
|
||||
- CI pipelines (see `DEVOPS-POLICY-20-001`) compile sample packs and fail on lint violations.
|
||||
- Simulation harnesses (`stella policy simulate`) highlight provided/queried fields so policy authors affirm assumptions before promotion.
|
||||
|
||||
---
|
||||
|
||||
## 11 · Anti-patterns & Mitigations
|
||||
|
||||
| Anti-pattern | Risk | Mitigation |
|
||||
|--------------|------|------------|
|
||||
| Catch-all suppress/ignore without scope | Masks all findings | Linter blocks rules with `when true` unless `priority` > 1000 and justification includes remediation plan. |
|
||||
| Comparing strings with inconsistent casing | Missed matches | Wrap comparisons in `lowercase(value)` to align casing or normalise metadata during ingest. |
|
||||
| Referencing `telemetry` without fallback | Null propagation | Wrap access in `exists(telemetry.reachability)`. |
|
||||
| Hardcoding tenant IDs | Breaks multi-tenant | Prefer `env.tenantTag` or metadata-sourced predicates. |
|
||||
| Duplicated rule names | Explain trace ambiguity | Compiler enforces unique `rule` identifiers within a policy. |
|
||||
|
||||
---
|
||||
|
||||
## 12 · Versioning & Compatibility
|
||||
|
||||
- `syntax "stella-dsl@1"` is mandatory.
|
||||
- Future revisions (`@2`, …) will be additive; existing packs continue to compile with their declared version.
|
||||
- The compiler canonicalises documents (sorted keys, normalised whitespace) before hashing to ensure reproducibility.
|
||||
|
||||
---
|
||||
|
||||
## 13 · Compliance Checklist
|
||||
|
||||
- [ ] **Grammar validated:** Policy compiles with `stella policy lint` and matches `syntax "stella-dsl@1"`.
|
||||
- [ ] **Deterministic constructs only:** No use of forbidden namespaces (`DateTime.Now`, `Guid.NewGuid`, external services).
|
||||
- [ ] **Rationales present:** Every status/severity change includes a `because` clause or `annotate` entry.
|
||||
- [ ] **Scoped suppressions:** Rules that ignore/suppress findings reference explicit components, vendors, or VEX justifications.
|
||||
- [ ] **Explain fields verified:** `annotate` keys align with Console/CLI expectations (documented in upcoming lifecycle guide).
|
||||
- [ ] **Offline parity tested:** Policy pack simulated in sealed mode (`--sealed`) to confirm absence of network dependencies.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
239
docs/policy/lifecycle.md
Normal file
239
docs/policy/lifecycle.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Policy Lifecycle & Approvals
|
||||
|
||||
> **Audience:** Policy authors, reviewers, security approvers, release engineers.
|
||||
> **Scope:** End-to-end flow for `stella-dsl@1` policies from draft through archival, including CLI/Console touch-points, Authority scopes, audit artefacts, and offline considerations.
|
||||
|
||||
This guide explains how a policy progresses through Stella Ops, which roles are involved, and the artefacts produced at every step. Pair it with the [Policy Engine Overview](overview.md), [DSL reference](dsl.md), and upcoming run documentation to ensure consistent authoring and rollout.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Protocol Summary
|
||||
|
||||
- Policies are **immutable versions** attached to a stable `policy_id`.
|
||||
- Lifecycle states: `draft → submitted → approved → active → archived`.
|
||||
- Every transition requires explicit Authority scopes and produces structured events + storage artefacts (`policies`, `policy_runs`, audit log collections).
|
||||
- Simulation and CI gating happen **before** approvals can be granted.
|
||||
- Activation triggers (runs, bundle exports, CLI `promote`) operate on the **latest approved** version per tenant.
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Draft
|
||||
Draft --> Draft: edit/save (policy:write)
|
||||
Draft --> Submitted: submit(reviewers) (policy:submit)
|
||||
Submitted --> Draft: requestChanges (policy:write)
|
||||
Submitted --> Approved: approve (policy:approve)
|
||||
Approved --> Active: activate/run (policy:run)
|
||||
Active --> Archived: archive (policy:archive)
|
||||
Approved --> Archived: superseded/explicit archive
|
||||
Archived --> [*]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2 · Roles & Authority Scopes
|
||||
|
||||
| Role (suggested) | Required scopes | Responsibilities |
|
||||
|------------------|-----------------|------------------|
|
||||
| **Policy Author** | `policy:write`, `policy:submit`, `policy:simulate` | Draft DSL, run local/CI simulations, submit for review. |
|
||||
| **Policy Reviewer** | `policy:review`, `policy:simulate`, `policy:runs` | Comment on submissions, demand additional simulations, request changes. |
|
||||
| **Policy Approver** | `policy:approve`, `policy:runs`, `policy:audit` | Grant final approval, ensure sign-off evidence captured. |
|
||||
| **Policy Operator** | `policy:run`, `policy:activate`, `findings:read` | Trigger full/incremental runs, monitor results, roll back to previous version. |
|
||||
| **Policy Auditor** | `policy:audit`, `findings:read`, `policy:history` | Review past versions, verify attestations, respond to compliance requests. |
|
||||
| **Policy Engine Service** | `effective:write`, `findings:read` | Materialise effective findings during runs; no approval capabilities. |
|
||||
|
||||
> Scopes are issued by Authority (`AUTH-POLICY-20-001`). Tenants may map organisational roles (e.g., `secops.approver`) to these scopes via issuer policy.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Lifecycle Stages in Detail
|
||||
|
||||
### 3.1 Draft
|
||||
|
||||
- **Who:** Authors (policy:write).
|
||||
- **Tools:** Console editor, `stella policy edit`, policy DSL files.
|
||||
- **Actions:**
|
||||
- Author DSL leveraging [stella-dsl@1](dsl.md).
|
||||
- Run `stella policy lint` and `stella policy simulate --sbom <fixtures>` locally.
|
||||
- Attach rationale metadata (`metadata.description`, tags).
|
||||
- **Artefacts:**
|
||||
- `policies` document with `status=draft`, `version=n`, `provenance.created_by`.
|
||||
- Local IR cache (`.stella.ir.json`) generated by CLI compile.
|
||||
- **Guards:**
|
||||
- Draft versions never run in production.
|
||||
- CI must lint drafts before allowing submission PRs (see `DEVOPS-POLICY-20-001`).
|
||||
|
||||
### 3.2 Submission
|
||||
|
||||
- **Who:** Authors with `policy:submit`.
|
||||
- **Tools:** Console “Submit for review” button, `stella policy submit <policyId> --reviewers ...`.
|
||||
- **Actions:**
|
||||
- Provide review notes and required simulations (CLI uploads attachments).
|
||||
- Choose reviewer groups; Authority records them in submission metadata.
|
||||
- **Artefacts:**
|
||||
- Policy document transitions to `status=submitted`, capturing `submitted_by`, `submitted_at`, reviewer list, simulation digest references.
|
||||
- Audit event `policy.submitted` (Authority timeline / Notifier integration).
|
||||
- **Guards:**
|
||||
- Submission blocked unless latest lint + compile succeed (<24 h freshness).
|
||||
- Must reference at least one simulation artefact (CLI enforces via `--attach`).
|
||||
|
||||
### 3.3 Review (Submitted)
|
||||
|
||||
- **Who:** Reviewers (`policy:review`), optionally authors responding.
|
||||
- **Tools:** Console review pane (line comments, overall verdict), `stella policy review`.
|
||||
- **Actions:**
|
||||
- Inspect DSL diff vs previous approved version.
|
||||
- Run additional `simulate` jobs (UI button or CLI).
|
||||
- Request changes → policy returns to `draft` with comment log.
|
||||
- **Artefacts:**
|
||||
- Comments stored in `policy_reviews` collection with timestamps, resolved flag.
|
||||
- Additional simulation run records appended to submission metadata.
|
||||
- **Guards:**
|
||||
- Approval cannot proceed until all blocking comments resolved.
|
||||
- Required reviewers (Authority rule) must vote before approver sees “Approve” button.
|
||||
|
||||
### 3.4 Approval
|
||||
|
||||
- **Who:** Approvers (`policy:approve`).
|
||||
- **Tools:** Console “Approve”, CLI `stella policy approve <id> --version n --note "rationale"`.
|
||||
- **Actions:**
|
||||
- Confirm compliance checks (see §6) all green.
|
||||
- Provide approval note (mandatory string captured in audit trail).
|
||||
- **Artefacts:**
|
||||
- Policy `status=approved`, `approved_by`, `approved_at`, `approval_note`.
|
||||
- Audit event `policy.approved` plus optional Notifier broadcast.
|
||||
- Immutable approval record stored in `policy_history`.
|
||||
- **Guards:**
|
||||
- Approver cannot be same identity as author (enforced by Authority config).
|
||||
- Approver must attest to successful simulation diff review (`--attach diff.json`).
|
||||
|
||||
### 3.5 Activation & Runs
|
||||
|
||||
- **Who:** Operators (`policy:run`, `policy:activate`).
|
||||
- **Tools:** Console “Promote to active”, CLI `stella policy activate <id> --version n`, `stella policy run`.
|
||||
- **Actions:**
|
||||
- Mark approved version as tenant’s active policy.
|
||||
- Trigger full run or rely on orchestrator for incremental runs.
|
||||
- Monitor results via Console dashboards or CLI run logs.
|
||||
- **Artefacts:**
|
||||
- `policy_runs` entries with `mode=full|incremental`, `policy_version=n`.
|
||||
- Effective findings collections updated; explain traces stored.
|
||||
- Activation event `policy.activated` with `runId`.
|
||||
- **Guards:**
|
||||
- Activation blocked if previous full run <24 h old failed or is pending.
|
||||
- Selection of SBOM/advisory snapshots uses consistent cursors recorded for reproducibility.
|
||||
|
||||
### 3.6 Archival / Rollback
|
||||
|
||||
- **Who:** Approvers or Operators with `policy:archive`.
|
||||
- **Tools:** Console menu, CLI `stella policy archive <id> --version n --reason`.
|
||||
- **Actions:**
|
||||
- Retire policies superseded by newer versions or revert to older approved version (`stella policy activate <id> --version n-1`).
|
||||
- Export archived version for audit bundles (Offline Kit integration).
|
||||
- **Artefacts:**
|
||||
- Policy `status=archived`, `archived_by`, `archived_at`, reason.
|
||||
- Audit event `policy.archived`.
|
||||
- Exported DSSE-signed policy pack stored if requested.
|
||||
- **Guards:**
|
||||
- Archival cannot proceed while runs using that version are in-flight.
|
||||
- Rollback requires documented incident reference.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Tooling Touchpoints
|
||||
|
||||
| Stage | Console | CLI | API |
|
||||
|-------|---------|-----|-----|
|
||||
| Draft | Inline linting, simulation panel | `stella policy lint`, `edit`, `simulate` | `POST /policies`, `PUT /policies/{id}/versions/{v}` |
|
||||
| Submit | Submit modal (attach simulations) | `stella policy submit` | `POST /policies/{id}/submit` |
|
||||
| Review | Comment threads, diff viewer | `stella policy review --approve/--request-changes` | `POST /policies/{id}/reviews` |
|
||||
| Approve | Approve dialog | `stella policy approve` | `POST /policies/{id}/approve` |
|
||||
| Activate | Promote button, run scheduler | `stella policy activate`, `run`, `simulate` | `POST /policies/{id}/run`, `POST /policies/{id}/activate` |
|
||||
| Archive | Archive / rollback menu | `stella policy archive` | `POST /policies/{id}/archive` |
|
||||
|
||||
All CLI commands emit structured JSON by default; use `--format table` for human review.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Audit & Observability
|
||||
|
||||
- **Storage:**
|
||||
- `policies` retains all versions with provenance metadata.
|
||||
- `policy_reviews` stores reviewer comments, timestamps, attachments.
|
||||
- `policy_history` summarises transitions (state, actor, note, diff digest).
|
||||
- `policy_runs` retains input cursors and determinism hash per run.
|
||||
- **Events:**
|
||||
- `policy.submitted`, `policy.review.requested`, `policy.approved`, `policy.activated`, `policy.archived`, `policy.rollback`.
|
||||
- Routed to Notifier + Timeline Indexer; offline deployments log to local event store.
|
||||
- **Logs & metrics:**
|
||||
- Policy Engine logs include `policyId`, `policyVersion`, `runId`, `approvalNote`.
|
||||
- Observability dashboards (see forthcoming `/docs/observability/policy.md`) highlight pending approvals, run SLA, VEX overrides.
|
||||
- **Reproducibility:**
|
||||
- Each state transition stores IR checksum and simulation diff digests, enabling offline audit replay.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Compliance Gates
|
||||
|
||||
| Gate | Stage | Enforced by | Requirement |
|
||||
|------|-------|-------------|-------------|
|
||||
| **DSL lint** | Draft → Submit | CLI/CI | `stella policy lint` successful within 24 h. |
|
||||
| **Simulation evidence** | Submit | CLI/Console | Attach diff from `stella policy simulate` covering baseline SBOM set. |
|
||||
| **Reviewer quorum** | Submit → Approve | Authority | Minimum approver/reviewer count configurable per tenant. |
|
||||
| **Determinism CI** | Approve | DevOps job | Twin run diff passes (`DEVOPS-POLICY-20-003`). |
|
||||
| **Activation health** | Approve → Activate | Policy Engine | Last run status succeeded; orchestrator queue healthy. |
|
||||
| **Export validation** | Archive | Offline Kit | DSSE-signed policy pack generated for long-term retention. |
|
||||
|
||||
Failure of any gate emits a `policy.lifecycle.violation` event and blocks transition until resolved.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Offline / Air-Gap Considerations
|
||||
|
||||
- Offline Kit bundles include:
|
||||
- Approved policy packs (`.policy.bundle` + DSSE signatures).
|
||||
- Submission/approval audit logs.
|
||||
- Simulation diff JSON for reproducibility.
|
||||
- Air-gapped sites operate with the same lifecycle:
|
||||
- Approvals happen locally; Authority runs in enclave.
|
||||
- Rollout requires manual import of policy packs from connected environment via signed bundles.
|
||||
- `stella policy simulate --sealed` ensures no outbound calls; required before approval in sealed mode.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Incident Response & Rollback
|
||||
|
||||
- Incident mode (triggered via `policy incident activate`) forces:
|
||||
- Immediate incremental run to evaluate mitigation policies.
|
||||
- Expanded trace retention for affected runs.
|
||||
- Automatic snapshot of currently active policies for evidence locker.
|
||||
- Rollback path:
|
||||
1. `stella policy activate <id> --version <previous>` with incident note.
|
||||
2. Orchestrator schedules full run to ensure findings align.
|
||||
3. Archive problematic version with reason referencing incident ticket.
|
||||
- Post-incident review must confirm new version passes gates before re-activation.
|
||||
|
||||
---
|
||||
|
||||
## 9 · CI/CD Integration (Reference)
|
||||
|
||||
- **Pre-merge:** run lint + simulation jobs against golden SBOM fixtures.
|
||||
- **Post-merge (main):** compile, compute IR checksum, stage for Offline Kit.
|
||||
- **Nightly:** determinism replay, `policy simulate` diff drift alerts, backlog of pending approvals.
|
||||
- **Notifications:** Slack/Email via Notifier when submissions await review > SLA or approvals succeed.
|
||||
|
||||
---
|
||||
|
||||
## 10 · Compliance Checklist
|
||||
|
||||
- [ ] **Role mapping validated:** Authority issuer config maps organisational roles to required `policy:*` scopes (per tenant).
|
||||
- [ ] **Submission evidence attached:** Latest simulation diff and lint artefacts linked to submission.
|
||||
- [ ] **Reviewer quorum met:** All required reviewers approved or acknowledged; no unresolved blocking comments.
|
||||
- [ ] **Approval note logged:** Approver justification recorded in audit trail alongside IR checksum.
|
||||
- [ ] **Activation guard passed:** Latest run status success, orchestrator queue healthy, determinism job green.
|
||||
- [ ] **Archive bundles produced:** When archiving, DSSE-signed policy pack exported and stored for offline retention.
|
||||
- [ ] **Offline parity proven:** For sealed deployments, `--sealed` simulations executed and logged before approval.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
|
||||
173
docs/policy/overview.md
Normal file
173
docs/policy/overview.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Policy Engine Overview
|
||||
|
||||
> **Goal:** Evaluate organisation policies deterministically against scanner SBOMs, Concelier advisories, and Excititor VEX evidence, then publish effective findings that downstream services can trust.
|
||||
|
||||
This document introduces the v2 Policy Engine: how the service fits into Stella Ops, the artefacts it produces, the contracts it honours, and the guardrails that keep policy decisions reproducible across air-gapped and connected deployments.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Role in the Platform
|
||||
|
||||
- **Purpose:** Compose policy verdicts by reconciling SBOM inventory, advisory metadata, VEX statements, and organisation rules.
|
||||
- **Form factor:** Dedicated `.NET 10` Minimal API host (`StellaOps.Policy.Engine`) plus worker orchestration. Policies are defined in `stella-dsl@1` packs compiled to an intermediate representation (IR) with a stable SHA-256 digest.
|
||||
- **Tenancy:** All workloads run under Authority-enforced scopes (`policy:*`, `findings:read`, `effective:write`). Only the Policy Engine identity may materialise effective findings collections.
|
||||
- **Consumption:** Findings ledger, Console, CLI, and Notify read the published `effective_finding_{policyId}` materialisations and policy run ledger (`policy_runs`).
|
||||
- **Offline parity:** Bundled policies import/export alongside advisories and VEX. In sealed mode the engine degrades gracefully, annotating explanations whenever cached signals replace live lookups.
|
||||
|
||||
---
|
||||
|
||||
## 2 · High-Level Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Inputs
|
||||
A[Scanner SBOMs<br/>Inventory & Usage]
|
||||
B[Concelier Advisories<br/>Canonical linksets]
|
||||
C[Excititor VEX<br/>Consensus status]
|
||||
D[Policy Packs<br/>stella-dsl@1]
|
||||
end
|
||||
subgraph PolicyEngine["StellaOps.Policy.Engine"]
|
||||
P1[DSL Compiler<br/>IR + Digest]
|
||||
P2[Joiners<br/>SBOM ↔ Advisory ↔ VEX]
|
||||
P3[Deterministic Evaluator<br/>Rule hits + scoring]
|
||||
P4[Materialisers<br/>effective findings]
|
||||
P5[Run Orchestrator<br/>Full & incremental]
|
||||
end
|
||||
subgraph Outputs
|
||||
O1[Effective Findings Collections]
|
||||
O2[Explain Traces<br/>Rule hit lineage]
|
||||
O3[Metrics & Traces<br/>policy_run_seconds,<br/>rules_fired_total]
|
||||
O4[Simulation/Preview Feeds<br/>CLI & Studio]
|
||||
end
|
||||
|
||||
A --> P2
|
||||
B --> P2
|
||||
C --> P2
|
||||
D --> P1 --> P3
|
||||
P2 --> P3 --> P4 --> O1
|
||||
P3 --> O2
|
||||
P5 --> P3
|
||||
P3 --> O3
|
||||
P3 --> O4
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3 · Core Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| **Policy Pack** | Versioned bundle of DSL documents, metadata, and checksum manifest. Packs import/export via CLI and Offline Kit bundles. |
|
||||
| **Policy Digest** | SHA-256 of the canonical IR; used for caching, explain trace attribution, and audit proofs. |
|
||||
| **Effective Findings** | Append-only Mongo collections (`effective_finding_{policyId}`) storing the latest verdict per finding, plus history sidecars. |
|
||||
| **Policy Run** | Execution record persisted in `policy_runs` capturing inputs, run mode, timings, and determinism hash. |
|
||||
| **Explain Trace** | Structured tree showing rule matches, data provenance, and scoring components for UI/CLI explain features. |
|
||||
| **Simulation** | Dry-run evaluation that compares a candidate pack against the active pack and produces verdict diffs without persisting results. |
|
||||
| **Incident Mode** | Elevated sampling/trace capture toggled automatically when SLOs breach; emits events for Notifier and Timeline Indexer. |
|
||||
|
||||
---
|
||||
|
||||
## 4 · Inputs & Pre-processing
|
||||
|
||||
### 4.1 SBOM Inventory
|
||||
|
||||
- **Source:** Scanner.WebService publishes inventory/usage SBOMs plus BOM-Index (roaring bitmap) metadata.
|
||||
- **Consumption:** Policy joiners use the index to expand candidate components quickly, keeping evaluation under the `< 5 s` warm path budget.
|
||||
- **Schema:** CycloneDX Protobuf + JSON views; Policy Engine reads canonical projections via shared SBOM adapters.
|
||||
|
||||
### 4.2 Advisory Corpus
|
||||
|
||||
- **Source:** Concelier exports canonical advisories with deterministic identifiers, linksets, and equivalence tables.
|
||||
- **Contract:** Policy Engine only consumes raw `content.raw`, `identifiers`, and `linkset` fields per Aggregation-Only Contract (AOC); derived precedence remains a policy concern.
|
||||
|
||||
### 4.3 VEX Evidence
|
||||
|
||||
- **Source:** Excititor consensus service resolves OpenVEX / CSAF statements, preserving conflicts.
|
||||
- **Usage:** Policy rules can require specific VEX vendors or justification codes; evaluator records when cached evidence substitutes for live statements (sealed mode).
|
||||
|
||||
### 4.4 Policy Packs
|
||||
|
||||
- Authored in Policy Studio or CLI, validated against the `stella-dsl@1` schema.
|
||||
- Compiler performs canonicalisation (ordering, defaulting) before emitting IR and digest.
|
||||
- Packs bundle scoring profiles, allowlist metadata, and optional reachability weighting tables.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Evaluation Flow
|
||||
|
||||
1. **Run selection** – Orchestrator accepts `full`, `incremental`, or `simulate` jobs. Incremental runs listen to change streams from Concelier, Excititor, and SBOM imports to scope re-evaluation.
|
||||
2. **Input staging** – Candidates fetched in deterministic batches; identity graph from Concelier strengthens PURL lookups.
|
||||
3. **Rule execution** – Evaluator walks rules in lexical order (first-match wins). Actions available: `block`, `ignore`, `warn`, `defer`, `escalate`, `requireVex`, each supporting quieting semantics where permitted.
|
||||
4. **Scoring** – `PolicyScoringConfig` applies severity, trust, reachability weights plus penalties (`warnPenalty`, `ignorePenalty`, `quietPenalty`).
|
||||
5. **Verdict and explain** – Engine constructs `PolicyVerdict` records with inputs, quiet flags, unknown confidence bands, and provenance markers; explain trees capture rule lineage.
|
||||
6. **Materialisation** – Effective findings collections are upserted append-only, stamped with run identifier, policy digest, and tenant.
|
||||
7. **Publishing** – Completed run writes to `policy_runs`, emits metrics (`policy_run_seconds`, `rules_fired_total`, `vex_overrides_total`), and raises events for Console/Notify subscribers.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Run Modes
|
||||
|
||||
| Mode | Trigger | Scope | Persistence | Typical Use |
|
||||
|------|---------|-------|-------------|-------------|
|
||||
| **Full** | Manual CLI (`stella policy run`), scheduled nightly, or emergency rebaseline | Entire tenant | Writes effective findings and run record | After policy publish or major advisory/VEX import |
|
||||
| **Incremental** | Change-stream queue driven by Concelier/Excititor/SBOM deltas | Only affected artefacts | Writes effective findings and run record | Continuous upkeep; ensures SLA ≤ 5 min from source change |
|
||||
| **Simulate** | CLI/Studio preview, CI pipelines | Candidate subset (diff against baseline) | No materialisation; produces explain & diff payloads | Policy authoring, CI regression suites |
|
||||
|
||||
All modes are cancellation-aware and checkpoint progress for replay in case of deployment restarts.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Outputs & Integrations
|
||||
|
||||
- **APIs** – Minimal API exposes policy CRUD, run orchestration, explain fetches, and cursor-based listing of effective findings (see `/docs/api/policy.md` once published).
|
||||
- **CLI** – `stella policy simulate/run/show` commands surface JSON verdicts, exit codes, and diff summaries suitable for CI gating.
|
||||
- **Console / Policy Studio** – UI reads explain traces, policy metadata, approval workflow status, and simulation diffs to guide reviewers.
|
||||
- **Findings Ledger** – Effective findings feed downstream export, Notify, and risk scoring jobs.
|
||||
- **Air-gap bundles** – Offline Kit includes policy packs, scoring configs, and explain indexes; export commands generate DSSE-signed bundles for transfer.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Determinism & Guardrails
|
||||
|
||||
- **Deterministic inputs** – All joins rely on canonical linksets and equivalence tables; batches are sorted, and random/wall-clock APIs are blocked by static analysis plus runtime guards (`ERR_POL_004`).
|
||||
- **Stable outputs** – Canonical JSON serializers sort keys; digests recorded in run metadata enable reproducible diffs across machines.
|
||||
- **Idempotent writes** – Materialisers upsert using `{policyId, findingId, tenant}` keys and retain prior versions with append-only history.
|
||||
- **Sandboxing** – Policy evaluation executes in-process with timeouts; restart-only plug-ins guarantee no runtime DLL injection.
|
||||
- **Compliance proof** – Every run stores digest of inputs (policy, SBOM batch, advisory snapshot) so auditors can replay decisions offline.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Security, Tenancy & Offline Notes
|
||||
|
||||
- **Authority scopes:** Gateway enforces `policy:read`, `policy:write`, `policy:simulate`, `policy:runs`, `findings:read`, `effective:write`. Service identities must present DPoP-bound tokens.
|
||||
- **Tenant isolation:** Collections partition by tenant identifier; cross-tenant queries require explicit admin scopes and return audit warnings.
|
||||
- **Sealed mode:** In air-gapped deployments the engine surfaces `sealed=true` hints in explain traces, warning about cached EPSS/KEV data and suggesting bundle refreshes (see `docs/airgap/EPIC_16_AIRGAP_MODE.md` §3.7).
|
||||
- **Observability:** Structured logs carry correlation IDs matching orchestrator job IDs; metrics integrate with OpenTelemetry exporters; sampled rule-hit logs redact policy secrets.
|
||||
- **Incident response:** Incident mode can be forced via API, boosting trace retention and notifying Notifier through `policy.incident.activated` events.
|
||||
|
||||
---
|
||||
|
||||
## 10 · Working with Policy Packs
|
||||
|
||||
1. **Author** in Policy Studio or edit DSL files locally. Validate with `stella policy lint`.
|
||||
2. **Simulate** against golden SBOM fixtures (`stella policy simulate --sbom fixtures/*.json`). Inspect explain traces for unexpected overrides.
|
||||
3. **Publish** via API or CLI; Authority enforces review/approval workflows (`draft → review → approve → rollout`).
|
||||
4. **Monitor** the subsequent incremental runs; if determinism diff fails in CI, roll back pack while investigating digests.
|
||||
5. **Bundle** packs for offline sites with `stella policy bundle export` and distribute via Offline Kit.
|
||||
|
||||
---
|
||||
|
||||
## 11 · Compliance Checklist
|
||||
|
||||
- [ ] **Scopes enforced:** Confirm gateway policy requires `policy:*` and `effective:write` scopes for all mutating endpoints.
|
||||
- [ ] **Determinism guard active:** Static analyzer blocks clock/RNG usage; CI determinism job diffing repeated runs passes.
|
||||
- [ ] **Materialisation audit:** Effective findings collections use append-only writers and retain history per policy run.
|
||||
- [ ] **Explain availability:** UI/CLI expose explain traces for every verdict; sealed-mode warnings display when cached evidence is used.
|
||||
- [ ] **Offline parity:** Policy bundles (import/export) tested in sealed environment; air-gap degradations documented for operators.
|
||||
- [ ] **Observability wired:** Metrics (`policy_run_seconds`, `rules_fired_total`, `vex_overrides_total`) and sampled rule hit logs emit to the shared telemetry pipeline with correlation IDs.
|
||||
- [ ] **Documentation synced:** API (`/docs/api/policy.md`), DSL grammar (`/docs/policy/dsl.md`), lifecycle (`/docs/policy/lifecycle.md`), and run modes (`/docs/policy/runs.md`) cross-link back to this overview.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
|
||||
187
docs/policy/runs.md
Normal file
187
docs/policy/runs.md
Normal file
@@ -0,0 +1,187 @@
|
||||
# Policy Runs & Orchestration
|
||||
|
||||
> **Audience:** Policy Engine operators, Scheduler team, DevOps, and tooling engineers planning CI integrations.
|
||||
> **Scope:** Run modes (`full`, `incremental`, `simulate`), orchestration pipeline, cursor management, replay/determinism guarantees, monitoring, and recovery procedures.
|
||||
|
||||
Policies only generate value when they execute deterministically against current SBOM, advisory, and VEX inputs. This guide explains how runs are triggered, how the orchestrator scopes work, and what artefacts you should expect at each stage.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Run Modes at a Glance
|
||||
|
||||
| Mode | Trigger sources | Scope | Persistence | Primary use |
|
||||
|------|-----------------|-------|-------------|-------------|
|
||||
| **Full** | Manual CLI (`stella policy run`), Console “Run now”, scheduled nightly job | Entire tenant (all registered SBOMs) | Writes `effective_finding_{policyId}` and `policy_runs` record | Baseline after policy approval, quarterly attestation, post-incident rechecks |
|
||||
| **Incremental** | Change streams (Concelier advisories, Excititor VEX, SBOM imports), orchestrator cron | Only affected `(sbom, advisory)` tuples | Writes diffs to effective findings and run record | Continuous upkeep meeting ≤ 5 min SLA from input change |
|
||||
| **Simulate** | Console review workspace, CLI (`stella policy simulate`), CI pipeline | Selected SBOM sample set (provided or golden set) | No materialisation; captures diff summary + explain traces | Authoring validation, regression safeguards, sealed-mode rehearsals |
|
||||
|
||||
All modes record their status in `policy_runs` with deterministic metadata:
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "run:P-7:2025-10-26T14:05:11Z:3f9a",
|
||||
"policy_id": "P-7",
|
||||
"policy_version": 4,
|
||||
"mode": "incremental",
|
||||
"status": "succeeded", // queued | running | succeeded | failed | canceled | replay_pending
|
||||
"inputs": {
|
||||
"sbom_set": ["sbom:S-42","sbom:S-318"],
|
||||
"advisory_cursor": "2025-10-26T13:59:00Z",
|
||||
"vex_cursor": "2025-10-26T13:58:30Z",
|
||||
"env": {"exposure":"internet"}
|
||||
},
|
||||
"stats": {
|
||||
"components": 1742,
|
||||
"rules_fired": 68023,
|
||||
"findings_written": 4321,
|
||||
"vex_overrides": 210
|
||||
},
|
||||
"determinism_hash": "sha256:…",
|
||||
"started_at": "2025-10-26T14:05:11Z",
|
||||
"finished_at": "2025-10-26T14:06:01Z",
|
||||
"tenant": "default"
|
||||
}
|
||||
```
|
||||
|
||||
> **Schemas & samples:** see `src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-20-001-POLICY-RUNS.md` and the fixtures in `samples/api/scheduler/policy-*.json` for canonical payloads consumed by CLI/UI/worker integrations.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Pipeline Overview
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant Trigger as Trigger (CLI / Console / Change Stream)
|
||||
participant Orchestrator as Policy Orchestrator
|
||||
participant Queue as Scheduler Queue (Mongo/NATS)
|
||||
participant Engine as Policy Engine Workers
|
||||
participant Concelier as Concelier Service
|
||||
participant Excititor as Excititor Service
|
||||
participant SBOM as SBOM Service
|
||||
participant Store as Mongo (policy_runs & effective_finding_*)
|
||||
participant Observability as Metrics/Events
|
||||
|
||||
Trigger->>Orchestrator: Run request (mode, scope, env)
|
||||
Orchestrator->>Queue: Enqueue PolicyRunRequest (idempotent key)
|
||||
Queue->>Engine: Lease job (fairness window)
|
||||
Engine->>Concelier: Fetch advisories + linksets (cursor-aware)
|
||||
Engine->>Excititor: Fetch VEX statements (cursor-aware)
|
||||
Engine->>SBOM: Fetch SBOM segments / BOM-Index
|
||||
Engine->>Engine: Evaluate policy (deterministic batches)
|
||||
Engine->>Store: Upsert effective findings + append history
|
||||
Engine->>Store: Persist policy_runs record + determinism hash
|
||||
Engine->>Observability: Emit metrics, traces, rule-hit logs
|
||||
Engine->>Orchestrator: Ack completion / failure
|
||||
Orchestrator->>Trigger: Notify (webhook, CLI, Console update)
|
||||
```
|
||||
|
||||
- **Trigger** – CLI, Console, or automated change stream publishes a `PolicyRunRequest`.
|
||||
- **Orchestrator** – Runs inside `StellaOps.Policy.Engine` worker host; applies fairness (tenant + policy quotas) and idempotency using run keys.
|
||||
- **Queue** – Backed by Mongo + optional NATS for fan-out; supports leases and replay on crash.
|
||||
- **Engine** – Stateless worker executing the deterministic evaluator.
|
||||
- **Store** – Mongo collections: `policy_runs`, `effective_finding_{policyId}`, `policy_run_events` (append-only history), optional object storage for explain traces.
|
||||
- **Observability** – Prometheus metrics (`policy_run_seconds`), OTLP traces, structured logs.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Input Scoping & Cursors
|
||||
|
||||
### 3.1 Advisory & VEX Cursors
|
||||
|
||||
- Each run records the latest Concelier change stream timestamp (`advisory_cursor`) and Excititor timestamp (`vex_cursor`).
|
||||
- Incremental runs receive change batches `(feedId, lastOffset)`; orchestrator deduplicates using `change_digest`.
|
||||
- Full runs set cursors to “current read time”, effectively resetting incremental baseline.
|
||||
|
||||
### 3.2 SBOM Selection
|
||||
|
||||
- Full runs enumerate all SBOM records declared active for the tenant.
|
||||
- Incremental runs derive SBOM set by intersecting advisory/VEX changes with BOM-Index lookups (component → SBOM mapping).
|
||||
- Simulations accept explicit SBOM list; if omitted, CLI uses `etc/policy/golden-sboms.json`.
|
||||
|
||||
### 3.3 Environment Metadata
|
||||
|
||||
- `env` block (free-form key/values) allows scenario-specific evaluation (e.g., `env.exposure=internet`).
|
||||
- Stored verbatim in `policy_runs.inputs.env` for replay; orchestrator hashes environment data to avoid cache collisions.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Execution Semantics
|
||||
|
||||
1. **Preparation:** Worker loads compiled IR for target policy version (cached by digest).
|
||||
2. **Batching:** Candidate tuples are grouped by SBOM, then by advisory to maintain deterministic order; page size defaults to 1024 tuples.
|
||||
3. **Evaluation:** Rules execute with first-match semantics; results captured as `PolicyVerdict`.
|
||||
4. **Materialisation:**
|
||||
- Upserts into `effective_finding_{policyId}` using `{policyId, sbomId, findingKey}`.
|
||||
- Previous versions stored in `effective_finding_{policyId}_history`.
|
||||
5. **Explain storage:** Full explain trees stored in blob store when `captureExplain=true`; incremental runs keep sampled traces (configurable).
|
||||
6. **Completion:** Worker writes final status, stats, determinism hash (combination of policy digest + ordered input digests), and emits `policy.run.completed` event.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Retry, Replay & Determinism
|
||||
|
||||
- **Retries:** Failures (network, validation) mark run `status=failed` and enqueue retry with exponential backoff capped at 3 attempts. Manual re-run via CLI resets counters.
|
||||
- **Replay:**
|
||||
- Use `policy_runs` record to assemble input snapshot (policy version, cursors, env).
|
||||
- Fetch associated SBOM/advisory/VEX data via `stella policy replay --run <id>` which rehydrates data into a sealed bundle.
|
||||
- Determinism hash mismatches between replay and recorded run indicate drift; CI job `DEVOPS-POLICY-20-003` compares successive runs to guard this.
|
||||
- **Cancellation:** Manual `stella policy run cancel <runId>` or orchestrator TTL triggers `status=canceled`; partial changes roll back via history append (no destructive delete).
|
||||
|
||||
---
|
||||
|
||||
## 6 · Trigger Sources & Scheduling
|
||||
|
||||
| Source | Description | SLAs |
|
||||
|--------|-------------|------|
|
||||
| **Nightly full run** | Default schedule per tenant; ensures baseline alignment. | Finish before 07:00 UTC |
|
||||
| **Change stream** | Concelier (`advisory_raw`), Excititor (`vex_raw`), SBOM imports emit `policy.trigger.delta` events. | Start within 60 s; complete within 5 min |
|
||||
| **Manual CLI/Console** | Operators run ad-hoc evaluations. | No SLA; warns if warm path > target |
|
||||
| **CI** | `stella policy simulate` runs in pipelines referencing golden SBOMs. | Must complete under 10 min to avoid pipeline timeout |
|
||||
|
||||
The orchestrator enforces max concurrency per tenant (`maxActiveRuns`), queue depth alarms, and fairness (round-robin per policy).
|
||||
|
||||
---
|
||||
|
||||
## 7 · Monitoring & Alerts
|
||||
|
||||
- **Metrics:** `policy_run_seconds`, `policy_run_queue_depth`, `policy_run_failures_total`, `policy_run_incremental_backlog`, `policy_rules_fired_total`.
|
||||
- **Dashboards:** Highlight pending approvals, incremental backlog age, top failing policies, VEX override ratios (tie-in with `/docs/observability/policy.md` once published).
|
||||
- **Alerts:**
|
||||
- Incremental backlog > 3 cycles.
|
||||
- Determinism hash mismatch.
|
||||
- Failure rate > 5 % over rolling hour.
|
||||
- Run duration > SLA (full > 30 min, incremental > 5 min).
|
||||
|
||||
---
|
||||
|
||||
## 8 · Failure Handling & Rollback
|
||||
|
||||
- **Soft failures:** Worker retries; after final failure, orchestrator emits `policy.run.failed` with diagnostics and recommended actions (e.g., missing SBOM segment).
|
||||
- **Hard failures:** Schema mismatch, determinism guard violation (`ERR_POL_004`) blocks further runs until resolved.
|
||||
- **Rollback:** Operators can activate previous policy version (see [Lifecycle guide](lifecycle.md)) and schedule full run to restore prior state.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Offline / Sealed Mode
|
||||
|
||||
- Change streams originate from offline bundle imports; orchestrator processes delta manifests.
|
||||
- Runs execute with `sealed=true`, blocking any external lookups; `policy_runs.inputs.env.sealed` set for auditing.
|
||||
- Explain traces annotate cached data usage to prompt bundle refresh.
|
||||
- Offline Kit exports include latest `policy_runs` snapshot and determinism hashes for evidence lockers.
|
||||
|
||||
---
|
||||
|
||||
## 10 · Compliance Checklist
|
||||
|
||||
- [ ] **Run schemas validated:** `PolicyRunRequest` / `PolicyRunStatus` DTOs from Scheduler Models (`SCHED-MODELS-20-001`) serialise deterministically; schema samples up to date.
|
||||
- [ ] **Cursor integrity:** Incremental runs persist advisory & VEX cursors; replay verifies identical input digests.
|
||||
- [ ] **Queue fairness configured:** Tenant-level concurrency limits and lease timeouts applied; no starvation of lower-volume policies.
|
||||
- [ ] **Determinism guard active:** CI replay job (`DEVOPS-POLICY-20-003`) green; determinism hash recorded on each run.
|
||||
- [ ] **Observability wired:** Metrics exported, alerts configured, and run events flowing to Notifier/Timeline.
|
||||
- [ ] **Offline tested:** `stella policy run --sealed` executed in air-gapped environment; explain traces flag cached evidence usage.
|
||||
- [ ] **Recovery plan rehearsed:** Failure and rollback drill documented; incident checklist aligned with Lifecycle guide.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
@@ -34,7 +34,7 @@ This runbook confirms that Scanner.WebService now surfaces the metadata Runtime
|
||||
```
|
||||
(Use `npm install --no-save ajv ajv-cli ajv-formats` once per clone.)
|
||||
|
||||
> Snapshot fixtures: see `docs/events/samples/scanner.report.ready@1.sample.json` for a canonical event that already carries `quietedFindingCount`.
|
||||
> Snapshot fixtures: see `docs/events/samples/scanner.event.report.ready@1.sample.json` for a canonical orchestrator event that already carries `quietedFindingCount`.
|
||||
|
||||
---
|
||||
|
||||
|
||||
71
docs/schemas/policy-diff-summary.schema.json
Normal file
71
docs/schemas/policy-diff-summary.schema.json
Normal file
@@ -0,0 +1,71 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-04/schema#",
|
||||
"title": "PolicyDiffSummary",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"SchemaVersion": {
|
||||
"type": "string"
|
||||
},
|
||||
"Added": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Removed": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Unchanged": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"BySeverity": {
|
||||
"type": "object",
|
||||
"additionalProperties": {
|
||||
"$ref": "#/definitions/PolicyDiffSeverityDelta"
|
||||
}
|
||||
},
|
||||
"RuleHits": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/PolicyDiffRuleDelta"
|
||||
}
|
||||
}
|
||||
},
|
||||
"definitions": {
|
||||
"PolicyDiffSeverityDelta": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"Up": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Down": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
}
|
||||
}
|
||||
},
|
||||
"PolicyDiffRuleDelta": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"RuleId": {
|
||||
"type": "string"
|
||||
},
|
||||
"RuleName": {
|
||||
"type": "string"
|
||||
},
|
||||
"Up": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Down": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
258
docs/schemas/policy-explain-trace.schema.json
Normal file
258
docs/schemas/policy-explain-trace.schema.json
Normal file
@@ -0,0 +1,258 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-04/schema#",
|
||||
"title": "PolicyExplainTrace",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"SchemaVersion": {
|
||||
"type": "string"
|
||||
},
|
||||
"FindingId": {
|
||||
"type": "string"
|
||||
},
|
||||
"PolicyId": {
|
||||
"type": "string"
|
||||
},
|
||||
"PolicyVersion": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"TenantId": {
|
||||
"type": "string"
|
||||
},
|
||||
"RunId": {
|
||||
"type": "string"
|
||||
},
|
||||
"EvaluatedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"Verdict": {
|
||||
"$ref": "#/definitions/PolicyExplainVerdict"
|
||||
},
|
||||
"RuleChain": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/PolicyExplainRule"
|
||||
}
|
||||
},
|
||||
"Evidence": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/PolicyExplainEvidence"
|
||||
}
|
||||
},
|
||||
"VexImpacts": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/PolicyExplainVexImpact"
|
||||
}
|
||||
},
|
||||
"History": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/PolicyExplainHistoryEvent"
|
||||
}
|
||||
},
|
||||
"Metadata": {
|
||||
"type": "object",
|
||||
"additionalProperties": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
},
|
||||
"definitions": {
|
||||
"PolicyExplainVerdict": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"Status": {
|
||||
"$ref": "#/definitions/PolicyVerdictStatus"
|
||||
},
|
||||
"Severity": {
|
||||
"oneOf": [
|
||||
{
|
||||
"type": "null"
|
||||
},
|
||||
{
|
||||
"$ref": "#/definitions/SeverityRank"
|
||||
}
|
||||
]
|
||||
},
|
||||
"Quiet": {
|
||||
"type": "boolean"
|
||||
},
|
||||
"Score": {
|
||||
"type": [
|
||||
"null",
|
||||
"number"
|
||||
],
|
||||
"format": "double"
|
||||
},
|
||||
"Rationale": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"PolicyVerdictStatus": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"Passed",
|
||||
"Warned",
|
||||
"Blocked",
|
||||
"Quieted",
|
||||
"Ignored"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4
|
||||
]
|
||||
},
|
||||
"SeverityRank": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"None",
|
||||
"Info",
|
||||
"Low",
|
||||
"Medium",
|
||||
"High",
|
||||
"Critical",
|
||||
"Unknown"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5,
|
||||
6
|
||||
]
|
||||
},
|
||||
"PolicyExplainRule": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"RuleId": {
|
||||
"type": "string"
|
||||
},
|
||||
"RuleName": {
|
||||
"type": "string"
|
||||
},
|
||||
"Action": {
|
||||
"type": "string"
|
||||
},
|
||||
"Decision": {
|
||||
"type": "string"
|
||||
},
|
||||
"Score": {
|
||||
"type": "number",
|
||||
"format": "double"
|
||||
},
|
||||
"Condition": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"PolicyExplainEvidence": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"Type": {
|
||||
"type": "string"
|
||||
},
|
||||
"Reference": {
|
||||
"type": "string"
|
||||
},
|
||||
"Source": {
|
||||
"type": "string"
|
||||
},
|
||||
"Status": {
|
||||
"type": "string"
|
||||
},
|
||||
"Weight": {
|
||||
"type": "number",
|
||||
"format": "double"
|
||||
},
|
||||
"Justification": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Metadata": {
|
||||
"type": "object",
|
||||
"additionalProperties": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"PolicyExplainVexImpact": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"StatementId": {
|
||||
"type": "string"
|
||||
},
|
||||
"Provider": {
|
||||
"type": "string"
|
||||
},
|
||||
"Status": {
|
||||
"type": "string"
|
||||
},
|
||||
"Accepted": {
|
||||
"type": "boolean"
|
||||
},
|
||||
"Justification": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Confidence": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"PolicyExplainHistoryEvent": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"Status": {
|
||||
"type": "string"
|
||||
},
|
||||
"OccurredAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"Actor": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Note": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
130
docs/schemas/policy-run-request.schema.json
Normal file
130
docs/schemas/policy-run-request.schema.json
Normal file
@@ -0,0 +1,130 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-04/schema#",
|
||||
"title": "PolicyRunRequest",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"SchemaVersion": {
|
||||
"type": "string"
|
||||
},
|
||||
"TenantId": {
|
||||
"type": "string"
|
||||
},
|
||||
"PolicyId": {
|
||||
"type": "string"
|
||||
},
|
||||
"PolicyVersion": {
|
||||
"type": [
|
||||
"integer",
|
||||
"null"
|
||||
],
|
||||
"format": "int32"
|
||||
},
|
||||
"Mode": {
|
||||
"$ref": "#/definitions/PolicyRunMode"
|
||||
},
|
||||
"Priority": {
|
||||
"$ref": "#/definitions/PolicyRunPriority"
|
||||
},
|
||||
"RunId": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"QueuedAt": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"RequestedBy": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"CorrelationId": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Metadata": {
|
||||
"type": [
|
||||
"null",
|
||||
"object"
|
||||
],
|
||||
"additionalProperties": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"Inputs": {
|
||||
"$ref": "#/definitions/PolicyRunInputs"
|
||||
}
|
||||
},
|
||||
"definitions": {
|
||||
"PolicyRunMode": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"Full",
|
||||
"Incremental",
|
||||
"Simulate"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2
|
||||
]
|
||||
},
|
||||
"PolicyRunPriority": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"Normal",
|
||||
"High",
|
||||
"Emergency"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2
|
||||
]
|
||||
},
|
||||
"PolicyRunInputs": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"SbomSet": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"AdvisoryCursor": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"VexCursor": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"Environment": {
|
||||
"type": "object",
|
||||
"additionalProperties": {}
|
||||
},
|
||||
"CaptureExplain": {
|
||||
"type": "boolean"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
217
docs/schemas/policy-run-status.schema.json
Normal file
217
docs/schemas/policy-run-status.schema.json
Normal file
@@ -0,0 +1,217 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-04/schema#",
|
||||
"title": "PolicyRunStatus",
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"SchemaVersion": {
|
||||
"type": "string"
|
||||
},
|
||||
"RunId": {
|
||||
"type": "string"
|
||||
},
|
||||
"TenantId": {
|
||||
"type": "string"
|
||||
},
|
||||
"PolicyId": {
|
||||
"type": "string"
|
||||
},
|
||||
"PolicyVersion": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Mode": {
|
||||
"$ref": "#/definitions/PolicyRunMode"
|
||||
},
|
||||
"Status": {
|
||||
"$ref": "#/definitions/PolicyRunExecutionStatus"
|
||||
},
|
||||
"Priority": {
|
||||
"$ref": "#/definitions/PolicyRunPriority"
|
||||
},
|
||||
"QueuedAt": {
|
||||
"type": "string",
|
||||
"format": "date-time"
|
||||
},
|
||||
"StartedAt": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"FinishedAt": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"DeterminismHash": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"ErrorCode": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Error": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Attempts": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"TraceId": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"ExplainUri": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
]
|
||||
},
|
||||
"Metadata": {
|
||||
"type": "object",
|
||||
"additionalProperties": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"Stats": {
|
||||
"$ref": "#/definitions/PolicyRunStats"
|
||||
},
|
||||
"Inputs": {
|
||||
"$ref": "#/definitions/PolicyRunInputs"
|
||||
}
|
||||
},
|
||||
"definitions": {
|
||||
"PolicyRunMode": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"Full",
|
||||
"Incremental",
|
||||
"Simulate"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2
|
||||
]
|
||||
},
|
||||
"PolicyRunExecutionStatus": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"Queued",
|
||||
"Running",
|
||||
"Succeeded",
|
||||
"Failed",
|
||||
"Cancelled",
|
||||
"ReplayPending"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
4,
|
||||
5
|
||||
]
|
||||
},
|
||||
"PolicyRunPriority": {
|
||||
"type": "integer",
|
||||
"description": "",
|
||||
"x-enumNames": [
|
||||
"Normal",
|
||||
"High",
|
||||
"Emergency"
|
||||
],
|
||||
"enum": [
|
||||
0,
|
||||
1,
|
||||
2
|
||||
]
|
||||
},
|
||||
"PolicyRunStats": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"Components": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"RulesFired": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"FindingsWritten": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"VexOverrides": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Quieted": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"Suppressed": {
|
||||
"type": "integer",
|
||||
"format": "int32"
|
||||
},
|
||||
"DurationSeconds": {
|
||||
"type": [
|
||||
"null",
|
||||
"number"
|
||||
],
|
||||
"format": "double"
|
||||
}
|
||||
}
|
||||
},
|
||||
"PolicyRunInputs": {
|
||||
"type": "object",
|
||||
"additionalProperties": false,
|
||||
"properties": {
|
||||
"SbomSet": {
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "string"
|
||||
}
|
||||
},
|
||||
"AdvisoryCursor": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"VexCursor": {
|
||||
"type": [
|
||||
"null",
|
||||
"string"
|
||||
],
|
||||
"format": "date-time"
|
||||
},
|
||||
"Environment": {
|
||||
"type": "object",
|
||||
"additionalProperties": {}
|
||||
},
|
||||
"CaptureExplain": {
|
||||
"type": "boolean"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
194
docs/security/authority-scopes.md
Normal file
194
docs/security/authority-scopes.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# Authority Scopes & Tenancy — AOC Update
|
||||
|
||||
> **Audience:** Authority Core, platform security engineers, DevOps owners.
|
||||
> **Scope:** Scope taxonomy, tenancy enforcement, rollout guidance for the Aggregation-Only Contract (Sprint 19).
|
||||
|
||||
Authority issues short-lived tokens bound to tenants and scopes. Sprint 19 introduces new scopes to support the AOC guardrails in Concelier and Excititor. This document lists the canonical scope catalogue, describes tenancy propagation, and outlines operational safeguards.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Scope catalogue (post AOC)
|
||||
|
||||
| Scope | Surface | Purpose | Notes |
|
||||
|-------|---------|---------|-------|
|
||||
| `advisory:write` | Concelier ingestion APIs | Allows append-only writes to `advisory_raw`. | Granted to Concelier WebService and trusted connectors. Requires tenant claim. |
|
||||
| `advisory:verify` | Concelier `/aoc/verify`, CLI, UI dashboard | Permits guard verification and access to violation summaries. | Read-only; used by `stella aoc verify` and console dashboard. |
|
||||
| `vex:write` | Excititor ingestion APIs | Append-only writes to `vex_raw`. | Mirrors `advisory:write`. |
|
||||
| `vex:verify` | Excititor `/aoc/verify`, CLI | Read-only verification of VEX ingestion. | Optional for environments without VEX feeds. |
|
||||
| `graph:write` | Cartographer build pipeline | Enqueue graph build/overlay jobs. | Reserved for the Cartographer service identity; requires tenant claim. |
|
||||
| `graph:read` | Graph API, Scheduler overlays, UI | Read graph projections/overlays. | Requires tenant claim; granted to Cartographer, Graph API, Scheduler. |
|
||||
| `graph:export` | Graph export endpoints | Stream GraphML/JSONL artefacts. | UI/gateway automation only; tenant required. |
|
||||
| `graph:simulate` | Policy simulation overlays | Trigger what-if overlays on graphs. | Restricted to automation; tenant required. |
|
||||
| `effective:write` | Policy Engine | Allows creation/update of `effective_finding_*` collections. | **Only** the Policy Engine service client may hold this scope. |
|
||||
| `effective:read` | Console, CLI, exports | Read derived findings. | Shared across tenants with role-based restrictions. |
|
||||
| `aoc:dashboard` | Console UI | Access AOC dashboard resources. | Bundles `advisory:verify`/`vex:verify` by default; keep for UI RBAC group mapping. |
|
||||
| `aoc:verify` | Automation service accounts | Execute verification via API without the full dashboard role. | For CI pipelines, offline kit validators. |
|
||||
| Existing scopes | (e.g., `policy:*`, `sbom:*`) | Unchanged. | Review `/docs/security/policy-governance.md` for policy-specific scopes. |
|
||||
|
||||
### 1.1 Scope bundles (roles)
|
||||
|
||||
- **`role/concelier-ingest`** → `advisory:write`, `advisory:verify`.
|
||||
- **`role/excititor-ingest`** → `vex:write`, `vex:verify`.
|
||||
- **`role/aoc-operator`** → `aoc:dashboard`, `aoc:verify`, `advisory:verify`, `vex:verify`.
|
||||
- **`role/policy-engine`** → `effective:write`, `effective:read`.
|
||||
- **`role/cartographer-service`** → `graph:write`, `graph:read`.
|
||||
- **`role/graph-gateway`** → `graph:read`, `graph:export`, `graph:simulate`.
|
||||
|
||||
Roles are declared per tenant in `authority.yaml`:
|
||||
|
||||
```yaml
|
||||
tenants:
|
||||
- name: default
|
||||
roles:
|
||||
concelier-ingest:
|
||||
scopes: [advisory:write, advisory:verify]
|
||||
aoc-operator:
|
||||
scopes: [aoc:dashboard, aoc:verify, advisory:verify, vex:verify]
|
||||
policy-engine:
|
||||
scopes: [effective:write, effective:read]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2 · Tenancy enforcement
|
||||
|
||||
### 2.1 Token claims
|
||||
|
||||
Tokens now include:
|
||||
|
||||
- `tenant` claim (string) — required for all ingestion and verification scopes.
|
||||
- `service_identity` (optional) — e.g., `policy-engine`, `cartographer`. Required when requesting `effective:write` or `graph:write`.
|
||||
- `delegation_allowed` (boolean) — defaults `false`. Prevents console tokens from delegating ingest scopes.
|
||||
|
||||
Authority rejects requests when:
|
||||
|
||||
- `tenant` is missing while requesting `advisory:*`, `vex:*`, or `aoc:*` scopes.
|
||||
- `service_identity != policy-engine` but `effective:write` is present (`ERR_AOC_006` enforcement).
|
||||
- `service_identity != cartographer` but `graph:write` is present (graph pipeline enforcement).
|
||||
- Tokens attempt to combine `advisory:write` with `effective:write` (separation of duties).
|
||||
|
||||
### 2.2 Propagation
|
||||
|
||||
- API Gateway forwards `tenant` claim as header (`X-Stella-Tenant`). Services refuse requests lacking the header.
|
||||
- Concelier/Excititor stamp tenant into raw documents and structured logs.
|
||||
- Policy Engine copies `tenant` from tokens into `effective_finding_*` collections.
|
||||
|
||||
### 2.3 Cross-tenant scenarios
|
||||
|
||||
- Platform operators with `tenant:admin` can assume other tenants via `/authority/tenant/switch` if explicitly permitted.
|
||||
- CLI commands accept `--tenant <id>` to override environment default; Authority logs tenant switch events (`authority.tenant.switch`).
|
||||
- Console tenant picker uses delegated token exchange (`/token/exchange`) to obtain scoped tenant tokens without exposing raw credentials.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Configuration changes
|
||||
|
||||
### 3.1 Authority configuration (`authority.yaml`)
|
||||
|
||||
Add new scopes and optional claims transformations:
|
||||
|
||||
```yaml
|
||||
security:
|
||||
scopes:
|
||||
- name: advisory:write
|
||||
description: Concelier raw ingestion
|
||||
- name: advisory:verify
|
||||
description: Verify Concelier ingestion
|
||||
- name: vex:write
|
||||
description: Excititor raw ingestion
|
||||
- name: vex:verify
|
||||
description: Verify Excititor ingestion
|
||||
- name: aoc:dashboard
|
||||
description: Access AOC UI dashboards
|
||||
- name: aoc:verify
|
||||
description: Run AOC verification
|
||||
- name: effective:write
|
||||
description: Policy Engine materialisation
|
||||
- name: effective:read
|
||||
description: Read derived findings
|
||||
claimTransforms:
|
||||
- match: { scope: "effective:write" }
|
||||
require:
|
||||
serviceIdentity: policy-engine
|
||||
- match: { scope: "graph:write" }
|
||||
require:
|
||||
serviceIdentity: cartographer
|
||||
```
|
||||
|
||||
### 3.2 Client registration
|
||||
|
||||
Update service clients:
|
||||
|
||||
- `Concelier.WebService` → request `advisory:write`, `advisory:verify`.
|
||||
- `Excititor.WebService` → request `vex:write`, `vex:verify`.
|
||||
- `Policy.Engine` → request `effective:write`, `effective:read`; set `properties.serviceIdentity=policy-engine`.
|
||||
- `Cartographer.Service` → request `graph:write`, `graph:read`; set `properties.serviceIdentity=cartographer`.
|
||||
- `Graph API Gateway` → request `graph:read`, `graph:export`, `graph:simulate`; tenant hint required.
|
||||
- `Console` → request `aoc:dashboard`, `effective:read` plus existing UI scopes.
|
||||
- `CLI automation` → request `aoc:verify`, `advisory:verify`, `vex:verify` as needed.
|
||||
|
||||
Client definition snippet:
|
||||
|
||||
```yaml
|
||||
clients:
|
||||
- clientId: concelier-web
|
||||
grantTypes: [client_credentials]
|
||||
scopes: [advisory:write, advisory:verify]
|
||||
tenants: [default]
|
||||
- clientId: policy-engine
|
||||
grantTypes: [client_credentials]
|
||||
scopes: [effective:write, effective:read]
|
||||
properties:
|
||||
serviceIdentity: policy-engine
|
||||
- clientId: cartographer-service
|
||||
grantTypes: [client_credentials]
|
||||
scopes: [graph:write, graph:read]
|
||||
properties:
|
||||
serviceIdentity: cartographer
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4 · Operational safeguards
|
||||
|
||||
- **Audit events:** Authority emits `authority.scope.granted` and `authority.scope.revoked` events with `scope` and `tenant`. Monitor for unexpected grants.
|
||||
- **Rate limiting:** Apply stricter limits on `/token` endpoints for clients requesting `advisory:write` or `vex:write` to mitigate brute-force ingestion attempts.
|
||||
- **Incident response:** Link AOC alerts to Authority audit logs to confirm whether violations come from expected identities.
|
||||
- **Rotation:** Rotate ingest client secrets alongside guard deployments; add rotation steps to `ops/authority-key-rotation.md`.
|
||||
- **Testing:** Integration tests must fail if tokens lacking `tenant` attempt ingestion; add coverage in Concelier/Excititor smoke suites (see `CONCELIER-CORE-AOC-19-013`).
|
||||
|
||||
---
|
||||
|
||||
## 5 · Offline & air-gap notes
|
||||
|
||||
- Offline Kit bundles include tenant-scoped service credentials. Ensure ingest bundles ship without `advisory:write` scopes unless strictly required.
|
||||
- CLI verification in offline environments uses pre-issued `aoc:verify` tokens; document expiration and renewal processes.
|
||||
- Authority replicas in air-gapped environments should restrict scope issuance to known tenants and log all `/token` interactions for later replay.
|
||||
|
||||
---
|
||||
|
||||
## 6 · References
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Architecture overview](../architecture/overview.md)
|
||||
- [Concelier architecture](../ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor architecture](../ARCHITECTURE_EXCITITOR.md)
|
||||
- [Policy governance](policy-governance.md)
|
||||
- [Authority key rotation playbook](../ops/authority-key-rotation.md)
|
||||
|
||||
---
|
||||
|
||||
## 7 · Compliance checklist
|
||||
|
||||
- [ ] Scope catalogue updated in Authority configuration templates.
|
||||
- [ ] Role mappings documented for each tenant profile.
|
||||
- [ ] Claim transforms enforce `serviceIdentity` for `effective:write`.
|
||||
- [ ] Claim transforms enforce `serviceIdentity` for `graph:write`.
|
||||
- [ ] Concelier/Excititor smoke tests cover missing tenant rejection.
|
||||
- [ ] Offline kit credentials reviewed for least privilege.
|
||||
- [ ] Audit/monitoring guidance validated with Observability Guild.
|
||||
- [ ] Authority Core sign-off recorded (owner: @authority-core, due 2025-10-28).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
114
docs/security/policy-governance.md
Normal file
114
docs/security/policy-governance.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# Policy Governance & Least Privilege
|
||||
|
||||
> **Audience:** Security Guild, Policy Guild, Authority Core, auditors.
|
||||
> **Scope:** Scopes, RBAC, approval controls, tenancy, auditing, and compliance requirements for Policy Engine v2.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Governance Principles
|
||||
|
||||
1. **Least privilege by scope** – API clients receive only the `policy:*` scopes required for their role; `effective:write` reserved for service identity.
|
||||
2. **Immutable history** – All policy changes, approvals, runs, and suppressions produce audit artefacts retrievable offline.
|
||||
3. **Separation of duties** – Authors cannot approve their own submissions; approvers require distinct scope and should not have deployment rights.
|
||||
4. **Deterministic verification** – Simulations, determinism checks, and incident replay bundles provide reproducible evidence for auditors.
|
||||
5. **Tenant isolation** – Policies, runs, and findings scoped to tenants; cross-tenant access requires explicit admin scopes and is logged.
|
||||
6. **Offline parity** – Air-gapped sites follow the same governance workflow with sealed-mode safeguards and signed bundles.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Authority Scopes & Role Mapping
|
||||
|
||||
| Scope | Description | Recommended role |
|
||||
|-------|-------------|------------------|
|
||||
| `policy:read` | View policies, revisions, runs, findings. | Readers, auditors. |
|
||||
| `policy:write` | Create/edit drafts, run lint/compile. | Authors (SecOps engineers). |
|
||||
| `policy:submit` | Move draft → submitted, attach simulations. | Authors with submission rights. |
|
||||
| `policy:review` | Comment/approve/request changes (non-final). | Reviewers (peer security, product). |
|
||||
| `policy:approve` | Final approval; can archive. | Approval board/security lead. |
|
||||
| `policy:activate` | Promote approved version, schedule activation. | Runtime operators / release managers. |
|
||||
| `policy:run` | Trigger runs, inspect live status. | Operators, automation bots. |
|
||||
| `policy:runs` | Read run history, replay bundles. | Operators, auditors. |
|
||||
| `policy:archive` | Retire versions, perform rollbacks. | Approvers, operators. |
|
||||
| `policy:simulate` | Execute simulations via API/CLI. | Authors, reviewers, CI. |
|
||||
| `policy:operate` | Activate incident mode, toggle sampling. | SRE/on-call. |
|
||||
| `findings:read` | View effective findings/explain. | Analysts, auditors, CLI. |
|
||||
| `effective:write` | **Service only** – materialise findings. | Policy Engine service principal. |
|
||||
|
||||
> Map organisation roles to scopes via Authority issuer config (`authority.tenants[].roles`). Document assignments in tenant onboarding checklist.
|
||||
|
||||
> **Authority configuration tip:** the Policy Engine service client must include `properties.serviceIdentity: policy-engine` and a tenant hint in `authority.yaml`. Authority rejects `effective:write` tokens that lack this marker. See [Authority scopes](authority-scopes.md) for the full scope catalogue.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Workflow Controls
|
||||
|
||||
- **Submit gate:** CLI/UI require fresh lint + simulation artefacts (<24 h). Submissions store reviewer list and diff attachments.
|
||||
- **Review quorum:** Authority policy enforces minimum reviewers (e.g., 2) and optional separation between functional/security domains.
|
||||
- **Approval guard:** Approvers must acknowledge simulation + determinism check completion. CLI enforces `--note` and `--attach` fields.
|
||||
- **Activation guard:** Policy Engine refuses activation when latest full run status ≠ success or incremental backlog aged > SLA.
|
||||
- **Rollback policy:** Rollbacks require incident reference and produce `policy.rollback` audit events.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Tenancy & Data Access
|
||||
|
||||
- Policies stored per tenant; `tenant-global` used for shared baselines.
|
||||
- API filters all requests by `X-Stella-Tenant` (default from token). Cross-tenant requests require `policy:tenant-admin`.
|
||||
- Effective findings collections include `tenant` field and unique indexes preventing cross-tenant writes.
|
||||
- CLI/Console display tenant context prominently; switching tenant triggers warnings when active policy differs.
|
||||
- Offline bundles encode tenant metadata; import commands validate compatibility before applying.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Audit & Evidence
|
||||
|
||||
- **Collections:** `policies`, `policy_reviews`, `policy_history`, `policy_runs`, `policy_run_events`, `effective_finding_*_history`.
|
||||
- **Events:** `policy.submitted`, `policy.review.requested`, `policy.approved`, `policy.activated`, `policy.archived`, `policy.run.*`, `policy.incident.*`.
|
||||
- **Explain traces:** Stored for critical findings (sampled); available via CLI/UI for auditors (requires `findings:read`).
|
||||
- **Offline evidence:** `stella policy bundle export` produces DSSE-signed packages containing DSL, IR digest, simulations, approval notes, run summaries, trace metadata.
|
||||
- **Retention:** Default 365 days for run history, extendable per compliance requirements; incident mode extends to 30 days minimum.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Secrets & Configuration Hygiene
|
||||
|
||||
- Policy Engine configuration loaded from environment/secret stores; no secrets in repo.
|
||||
- CLI profiles should store tokens encrypted (`stella profile set --secret`).
|
||||
- UI/CLI logs redact tokens, reviewer emails, and attachments.
|
||||
- Rotating tokens/keys: Authority exposes `policy scopes` in discovery docs; follow `/docs/security/authority-scopes.md` for rotation.
|
||||
- Use `policy:operate` to disable self-service simulation temporarily during incident response if needed.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Incident Response
|
||||
|
||||
- Trigger incident mode for determinism violations, backlog surges, or suspected policy abuse.
|
||||
- Capture replay bundles and run `stella policy run replay` for affected runs.
|
||||
- Coordinate with Observability dashboards (see `/docs/observability/policy.md`) to monitor queue depth, failures.
|
||||
- After resolution, document remediation in Lifecycle guide (§8) and attach to approval history.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Offline / Air-Gapped Governance
|
||||
|
||||
- Same scopes apply; tokens issued by local Authority.
|
||||
- Approvers must use offline UI/CLI to sign submissions; attachments stored locally.
|
||||
- Bundle import/export must be signed (DSSE + cosign). CLI warns if signatures missing.
|
||||
- Sealed-mode banner reminds operators to refresh bundles when staleness thresholds exceeded.
|
||||
- Offline audits rely on evidence bundles and local `policy_runs` snapshot.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Compliance Checklist
|
||||
|
||||
- [ ] **Scope mapping reviewed:** Authority issuer config updated; RBAC matrix stored with change request.
|
||||
- [ ] **Separation enforced:** Automated checks block self-approval; review quorum satisfied.
|
||||
- [ ] **Activation guard documented:** Operators trained on run health checks before promoting.
|
||||
- [ ] **Audit exports tested:** Evidence bundles verified (hash/signature) and stored per compliance policy.
|
||||
- [ ] **Incident drills rehearsed:** Replay/rollback procedures executed and logged.
|
||||
- [ ] **Offline parity confirmed:** Air-gapped site executes submit/approve flow with sealed-mode guidance.
|
||||
- [ ] **Documentation cross-links:** References to lifecycle, runs, observability, CLI, and API docs validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
174
docs/ui/admin.md
Normal file
174
docs/ui/admin.md
Normal file
@@ -0,0 +1,174 @@
|
||||
# StellaOps Console - Admin Workspace
|
||||
|
||||
> **Audience:** Authority Guild, Console admins, support engineers, tenant operators.
|
||||
> **Scope:** Tenant management, role mapping, token lifecycle, integrations, fresh-auth prompts, security guardrails, offline behaviour, and compliance checklist for Sprint 23.
|
||||
|
||||
The Admin workspace centralises Authority-facing controls: tenants, roles, API clients, tokens, and integrations. It surfaces RBAC mappings, token issuance logs, and bootstrap flows with the same offline-first guarantees as the rest of the console.
|
||||
|
||||
---
|
||||
|
||||
## 1. Access and prerequisites
|
||||
|
||||
- **Route:** `/console/admin` with sub-routes for tenants, users, roles, tokens, integrations, audit, and bootstrap.
|
||||
- **Scopes:**
|
||||
- `ui.admin` (base access)
|
||||
- `authority:tenants.read` / `authority:tenants.write`
|
||||
- `authority:roles.read` / `authority:roles.write`
|
||||
- `authority:tokens.read` / `authority:tokens.revoke`
|
||||
- `authority:clients.read` / `authority:clients.write`
|
||||
- `authority:audit.read` (view audit trails)
|
||||
- **Fresh-auth:** Sensitive actions (token revoke, bootstrap key issue, signing key rotation) require fresh-auth challenge.
|
||||
- **Dependencies:** Authority service (`/internal/*` APIs), revocation export, JWKS, licensing posture endpoint, integration config store.
|
||||
|
||||
---
|
||||
|
||||
## 2. Layout overview
|
||||
|
||||
```
|
||||
+--------------------------------------------------------------------+
|
||||
| Header: Tenant picker - environment badge - security banner |
|
||||
+--------------------------------------------------------------------+
|
||||
| Tabs: Tenants | Roles & Scopes | Users & Tokens | Integrations | Audit |
|
||||
+--------------------------------------------------------------------+
|
||||
| Sidebar: Quick actions (Invite user, Create client, Export revocations)
|
||||
| Main panel varies per tab |
|
||||
+--------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
The header includes offline status indicator and link to Authority health page.
|
||||
|
||||
---
|
||||
|
||||
## 3. Tenants tab
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| **Tenant ID** | Lowercase slug used in tokens and client registrations. |
|
||||
| **Display name** | Human-friendly name. |
|
||||
| **Status** | `active`, `suspended`, `pending`. Suspended tenants block token issuance. |
|
||||
| **Isolation mode** | `dedicated`, `shared`, or `sandbox`. Drives RBAC defaults. |
|
||||
| **Default roles** | Roles automatically assigned to new users within the tenant. |
|
||||
| **Offline snapshots** | Latest snapshot timestamp, checksum, operator. |
|
||||
|
||||
Actions:
|
||||
|
||||
- `Create tenant` (requires `authority:tenants.write`). Form captures display name, slug, isolation mode, default roles, bootstrap contact, optional plan metadata.
|
||||
- `Suspend/Resume` toggles token issuance and surfaces audit entry.
|
||||
- `Export tenant bundle` downloads tenant-specific revocation + JWKS package for air-gap distribution.
|
||||
- CLI parity: `stella auth tenant create --tenant <id>`, `stella auth tenant suspend --tenant <id>`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Roles & scopes tab
|
||||
|
||||
- Table lists roles with mapped scopes and audiences.
|
||||
- Inline editor supports adding/removing scopes (with validation).
|
||||
- Scope categories: UI, Scanner, Concelier, Excititor, Policy, Attestor, Notifier, Scheduler, Offline kit.
|
||||
- Visual diff shows impact of changes on linked clients/users before committing.
|
||||
- "Effective permissions" view summarises what each role grants per service.
|
||||
- CLI parity: `stella auth role update --role ui.admin --add-scope authority:tokens.revoke`.
|
||||
|
||||
---
|
||||
|
||||
## 5. Users & tokens tab
|
||||
|
||||
Sections:
|
||||
|
||||
1. **User list** - identity, tenant, roles, last login, MFA status. Actions include reset password (if plugin supports), enforce fresh-auth, disable user.
|
||||
2. **Token inventory** - lists active tokens (access/refresh/device). Columns: token ID, type, subject, audience, issued at, expires, status. Toggle to show revoked tokens.
|
||||
3. **Token details** drawer shows claims, sender constraint (`cnf`), issuance metadata, revocation history.
|
||||
4. **Revoke token** action requires fresh-auth and prompts for reason (incident, user request, compromise).
|
||||
5. **Bulk revoke** (per tenant or role) triggers Authority revocation export to ensure downstream services purge caches.
|
||||
|
||||
Audit entries appear for every user/token change. CLI parity: `stella auth token revoke --token <id>`.
|
||||
|
||||
---
|
||||
|
||||
## 6. Integrations tab
|
||||
|
||||
- **Authority clients** list (service accounts) with grant types, allowed scopes, DPoP/mTLS settings, tenant hints, and rotation status.
|
||||
- **Bootstrap bundles** - downloadable templates for new clients/users; includes configuration YAML and CLI instructions.
|
||||
- **External IdP connectors** (optional) - displays status for SAML/OIDC plugins; includes metadata upload field and test login result.
|
||||
- **Licensing posture** - read-only panel summarising plan tier, entitlement expiry, and contact info (pulled from licensing service).
|
||||
- **Notifications** - optional webhook configuration for token events (on revoke, on failure).
|
||||
- CLI parity: `stella auth client create --client concelier --grant client_credentials --tenant prod`.
|
||||
|
||||
---
|
||||
|
||||
## 7. Audit tab
|
||||
|
||||
- Timeline view of administrative events (user changes, role updates, token revocations, bootstrap actions, key rotations).
|
||||
- Filters: event type, actor, tenant, scope, correlation ID.
|
||||
- Export button downloads CSV/JSON for SOC ingestion.
|
||||
- "Open in logs" copies search query pre-populated with correlation IDs.
|
||||
- CLI parity: `stella auth audit export --from 2025-10-20`.
|
||||
|
||||
---
|
||||
|
||||
## 8. Fresh-auth prompts
|
||||
|
||||
- High-risk actions (revoke all tokens, rotate signing key, create privileged client) trigger modal requiring credential re-entry or hardware key touch.
|
||||
- Fresh-auth window is 5 minutes; countdown displayed.
|
||||
- UI surface indicates when current session is outside fresh-auth window; sensitive buttons disabled until re-auth.
|
||||
- Audit log records fresh-auth events (`authority.fresh_auth.start`, `authority.fresh_auth.success`).
|
||||
- CLI parity: `stella auth fresh-auth` obtains short-lived token for scriptable flows.
|
||||
|
||||
---
|
||||
|
||||
## 9. Security guardrails
|
||||
|
||||
- DPoP enforcement reminders for UI clients; console warns if any client lacks sender constraint.
|
||||
- mTLS enforcement summary for high-value audiences (Signer/Attestor).
|
||||
- Token policy checklists (access token TTL, refresh token policy) with alerts when deviating from defaults.
|
||||
- Revocation bundle export status (timestamp, digest, operator).
|
||||
- Key rotation panel showing current `kid`, last rotation, next scheduled rotation, and manual trigger button (ties into Authority rotate API).
|
||||
- CLI parity: `stella auth signing rotate` for script automation.
|
||||
|
||||
---
|
||||
|
||||
## 10. Offline and air-gap behaviour
|
||||
|
||||
- Offline banner indicates snapshot version; disables direct remote calls.
|
||||
- Tenant/role edits queue change manifests; UI instructs users to apply via CLI (`stella auth apply --bundle <file>`).
|
||||
- Token inventory shows snapshot state; revoke buttons generate scripts for offline Authority host.
|
||||
- Integrations tab offers manual download/upload for client definitions and IdP metadata.
|
||||
- Audit exports default to local storage with checksum output for transfer.
|
||||
|
||||
---
|
||||
|
||||
## 11. Screenshot coordination
|
||||
|
||||
- Placeholders:
|
||||
- ``
|
||||
- ``
|
||||
- ``
|
||||
- Capture real screenshots with Authority Guild once Sprint 23 UI is final (tracked in `#console-screenshots`, 2025-10-26 entry). Provide both light and dark theme variants.
|
||||
|
||||
---
|
||||
|
||||
## 12. References
|
||||
|
||||
- `/docs/ARCHITECTURE_AUTHORITY.md` - Authority architecture.
|
||||
- `/docs/11_AUTHORITY.md` - Authority service overview.
|
||||
- `/docs/security/authority-scopes.md` - scope definitions.
|
||||
- `/docs/ui/policies.md` - policy approvals requiring fresh-auth.
|
||||
- `/docs/ui/console-overview.md` - navigation shell.
|
||||
- `/docs/cli/authentication.md` (pending) and `/docs/cli/policy.md` for CLI flows.
|
||||
- `/docs/ops/scheduler-runbook.md` for integration with scheduler token rotation.
|
||||
|
||||
---
|
||||
|
||||
## 13. Compliance checklist
|
||||
|
||||
- [ ] Tenants, roles/scopes, and token management documented with actions and CLI parity.
|
||||
- [ ] Integrations and audit views covered.
|
||||
- [ ] Fresh-auth prompts and guardrails described.
|
||||
- [ ] Security controls (DPoP, mTLS, key rotation, revocations) captured.
|
||||
- [ ] Offline behaviour explained with script guidance.
|
||||
- [ ] Screenshot placeholders and coordination noted.
|
||||
- [ ] References validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
|
||||
199
docs/ui/advisories-and-vex.md
Normal file
199
docs/ui/advisories-and-vex.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# StellaOps Console - Advisories and VEX
|
||||
|
||||
> **Audience:** Console UX team, Concelier and Excititor guilds, support and compliance engineers.
|
||||
> **Scope:** Advisory aggregation UX, VEX consensus display, conflict indicators, raw document viewer, provenance banners, CLI parity, and Aggregation-Only Contract (AOC) guardrails for Sprint 23.
|
||||
|
||||
The Advisories and VEX surfaces expose Concelier and Excititor outputs without mutating the underlying data. Operators can review upstream statements, check consensus summaries, inspect conflicts, and hand off evidence to downstream tooling while staying within the Aggregation-Only Contract.
|
||||
|
||||
---
|
||||
|
||||
## 1. Access and prerequisites
|
||||
|
||||
- **Routes:**
|
||||
- `/console/advisories` (advisory list and detail)
|
||||
- `/console/vex` (VEX consensus and raw claim explorer)
|
||||
- **Scopes:** `advisory.read` and `vex.read` (base access), `advisory.verify` / `vex.verify` for verification actions, `downloads.read` for evidence exports.
|
||||
- **Feature flags:** `advisoryExplorer.enabled`, `vexExplorer.enabled`, `aggregation.conflictIndicators`.
|
||||
- **Dependencies:** Concelier WebService (aggregation API + delta metrics), Excititor WebService (consensus API + conflict feeds), Policy Engine explain hints (optional link-outs), Authority tenant enforcement.
|
||||
- **Offline behaviour:** Uses Offline Kit snapshots when gateway is in sealed mode; verify buttons queue until connectivity resumes.
|
||||
|
||||
---
|
||||
|
||||
## 2. Layout overview
|
||||
|
||||
```
|
||||
+---------------------------------------------------------------------+
|
||||
| Header: Tenant badge - global filters - status ticker - actions |
|
||||
+---------------------------------------------------------------------+
|
||||
| Left rail: Saved views - provider filters - verification queue |
|
||||
+---------------------------------------------------------------------+
|
||||
| Main split pane |
|
||||
| - Advisories tab (grid + detail drawer) |
|
||||
| - VEX tab (consensus table + claim drawer) |
|
||||
| Tabs remember last active view per tenant. |
|
||||
+---------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
The header reuses console-wide context chips (`Tenant`, `Severity`, `Source`, `Time`) and the status ticker that streams Concelier and Excititor deltas.
|
||||
|
||||
---
|
||||
|
||||
## 3. Advisory aggregation view
|
||||
|
||||
| Element | Description |
|
||||
|---------|-------------|
|
||||
| **Grid columns** | Vulnerability key (CVE/GHSA/vendor), Title, Source set, Last merged, Severity badge, KEV flag, Affected product count, Merge hash. |
|
||||
| **Source chips** | Show contributing providers (NVD, Red Hat, Debian, vendor PSIRT). Hover reveals precedence order and timestamps. |
|
||||
| **Severity** | Displays the highest severity declared by any source; tooltip lists per-source severities and vectors. |
|
||||
| **KEV / Exploit status** | Badge highlights known exploited status from Concelier enrichment; links to KEV reference. |
|
||||
| **Merge hash** | Deterministic hash from Concelier `merge_event`. Clicking copies hash and opens provenance banner. |
|
||||
| **Filters** | Vulnerability identifier search, provider multi-select, severity picker, KEV toggle, affected product range slider, time window. |
|
||||
| **List actions** | `Open detail`, `Copy CLI` (`stella advisory show ...`), `Compare sources`, `Queue verify`. |
|
||||
|
||||
The grid virtualises up to 15,000 advisories per tenant. Beyond that, the UI engages server-side pagination with cursor hints supplied by Concelier.
|
||||
|
||||
---
|
||||
|
||||
## 4. Advisory detail drawer
|
||||
|
||||
Sections within the drawer:
|
||||
|
||||
1. **Summary cards** (title, published/modified timestamps, advisory merge hash, total sources, exploited flag).
|
||||
2. **Sources timeline** listing each contributing document with signature status, fetched timestamps, precedence rank, and quick links to raw view.
|
||||
3. **Affected products** table (product key, introduced/fixed, range semantics, distro qualifiers, notes). Column toggles allow switching between SemVer and distro notation.
|
||||
4. **Conflict indicators** show when sources disagree on fixed versions, severity, or affected sets. Each conflict row links to an explainer panel that describes the winning value, losing sources, and precedence rule.
|
||||
5. **References** collapsible list (patches, advisories, exploits).
|
||||
6. **Raw JSON** viewer (read-only) using canonical Concelier payload. Users can copy JSON or download via `GET /console/advisories/raw/{id}`.
|
||||
7. **CLI parity** card with commands:
|
||||
- `stella advisory show --tenant <tenant> --vuln <id>`
|
||||
- `stella advisory sources --tenant <tenant> --vuln <id>`
|
||||
- `stella advisory export --tenant <tenant> --vuln <id> --format cdx-json`
|
||||
|
||||
Provenance banner at the top indicates whether all sources are signed, partially signed, or unsigned, referencing AOC guardrails. Unsigned sources trigger a warning and link to the verification checklist.
|
||||
|
||||
---
|
||||
|
||||
## 5. VEX explorer
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| **Consensus table** | Rows keyed by `(vulnId, productKey)` with rollup status (affected, not affected, fixed, under investigation), confidence score, provider count, and last evaluation timestamp. |
|
||||
| **Status badges** | Colour-coded (red affected, green not affected, blue fixed, amber under investigation). Tooltips show justification and policy revision used. |
|
||||
| **Provider breakdown** | Hover or expand to see source list with accepted/ignored flag, status, justification code, signature state, weight. |
|
||||
| **Filters** | Product search (PURL), status filter, provider filter, justification codes, confidence threshold slider. |
|
||||
| **Saved views** | Prebuilt presets: `Vendor consensus`, `Distro overrides`, `Conflicts`, `Pending investigation`. |
|
||||
|
||||
---
|
||||
|
||||
## 6. VEX detail drawer
|
||||
|
||||
Tabs within the drawer:
|
||||
|
||||
- **Consensus summary**: Restates rollup status, policy revision, confidence benchmarks, and referencing runs.
|
||||
- **Claims list**: Every raw claim from Excititor with provenance, signature result, justification, supersedes chain, evidence snippets. Claims are grouped by provider tier (vendor, distro, ecosystem, CERT).
|
||||
- **Conflict explainers**: For conflicting claims, shows why a claim was ignored (weight, stale timestamp, failing justification gate). Includes inline diff between competing claims.
|
||||
- **Events**: Timeline of claim arrivals and consensus evaluations with correlation IDs, accessible for debugging.
|
||||
- **Raw JSON**: Canonical `VexClaim` or `VexConsensus` payloads with copy/download. CLI parity callouts:
|
||||
- `stella vex consensus show --tenant <tenant> --vuln <id> --product <purl>`
|
||||
- `stella vex claims show --tenant <tenant> --vuln <id> --provider <provider>`
|
||||
|
||||
---
|
||||
|
||||
## 7. Raw viewers and provenance
|
||||
|
||||
- Raw viewers display canonical payloads with syntax highlighting and copy-as-JSON support.
|
||||
- Provenance banner presents: source URI, document digest, signature status, fetch timestamps, collector version.
|
||||
- Users can open raw documents in a modal that includes:
|
||||
- `sha256` digest with copy button
|
||||
- Signature verification summary (passing keys, missing signatures, errors)
|
||||
- `Download DSSE bundle` button when the document is attested
|
||||
- `Open in logs` link that copies search query (`correlationId=...`) for log aggregation tools.
|
||||
|
||||
All raw views are read-only to maintain Aggregation-Only guarantees.
|
||||
|
||||
---
|
||||
|
||||
## 8. Conflict indicators and aggregation-not-merge UX
|
||||
|
||||
- Concelier retains every source; the UI surfaces conflicts rather than merging them.
|
||||
- Conflict badges appear in grids and detail views when sources disagree on affected ranges, fixed versions, severity, or exploit flags.
|
||||
- Clicking a badge opens the conflict explainer panel (powered by Concelier merge metadata) that lists winning/losing sources, ranks, and reasoning (e.g., "Vendor PSIRT overrides ecosystem advisory").
|
||||
- Excititor conflicts highlight discarded claims with reasons (stale, failing justification, low weight). Operators can override weights downstream via Policy Engine if needed.
|
||||
- UI copy explicitly reminds users that policy decisions happen elsewhere; these views show aggregated facts only.
|
||||
|
||||
---
|
||||
|
||||
## 9. Verification workflows
|
||||
|
||||
- **Run verify** buttons call Concelier or Excititor verification endpoints (`POST /console/advisories/verify`, `POST /console/vex/verify`) scoped by tenant and source filters.
|
||||
- Verification results appear as banners summarising documents checked, signatures verified, and guard violations.
|
||||
- Failed verifications show actionable error IDs (`ERR_AOC_00x`), matching CLI output.
|
||||
- Verification history accessible via the status ticker dropdown; entries include operator, scope, and correlation IDs.
|
||||
|
||||
---
|
||||
|
||||
## 10. Exports and automation
|
||||
|
||||
- Advisory tab exposes export actions: `Download normalized advisory`, `Download affected products CSV`, `Download source bundle` (raw documents packaged with manifest).
|
||||
- VEX tab supports exports for consensus snapshots, raw claims, and provider deltas.
|
||||
- Export manifests include merge hash or consensus digest, tenant ID, timestamp, and signature state.
|
||||
- CLI parity snippets accompany each export (e.g., `stella advisory export`, `stella vex export`).
|
||||
- Automation: copy buttons for webhook subscription (`/downloads/hooks/subscribe`) and ORAS push commands when using remote registries.
|
||||
|
||||
---
|
||||
|
||||
## 11. Observability and SSE updates
|
||||
|
||||
- Status ticker shows ingest lag (`advisory_delta_minutes`, `vex_delta_minutes`), last merge event hash, and verification queue depth.
|
||||
- Advisory and VEX grids refresh via SSE channels; updates animate row badges (new source, conflict resolved).
|
||||
- Metrics surfaced in drawers: ingestion age, signature pass rate, consensus evaluation duration.
|
||||
- Errors display correlation IDs linking to Concelier/Excititor logs.
|
||||
|
||||
---
|
||||
|
||||
## 12. Offline and air-gap behaviour
|
||||
|
||||
- When offline, list views display snapshot badge, staleness timer, and disable real-time verification.
|
||||
- Raw downloads reference local snapshot directories and include checksum instructions.
|
||||
- Exports queue locally; UI offers `Copy to removable media` instructions.
|
||||
- CLI parity switches to offline commands (`--offline`, `--snapshot`).
|
||||
- Tenant picker hides tenants not present in the snapshot to avoid partial data views.
|
||||
|
||||
---
|
||||
|
||||
## 13. Screenshot coordination
|
||||
|
||||
- Placeholders:
|
||||
- ``
|
||||
- ``
|
||||
- Coordinate with Console Guild to capture updated screenshots (dark and light themes) once Sprint 23 build candidate is tagged. Tracking in Slack channel `#console-screenshots` (entry 2025-10-26).
|
||||
|
||||
---
|
||||
|
||||
## 14. References
|
||||
|
||||
- `/docs/ui/console-overview.md` - shell, filters, tenant model.
|
||||
- `/docs/ui/navigation.md` - command palette, deep-link schema.
|
||||
- `/docs/ingestion/aggregation-only-contract.md` - AOC guardrails.
|
||||
- `/docs/architecture/CONCELIER.md` - merge rules, provenance.
|
||||
- `/docs/architecture/EXCITITOR.md` - VEX consensus model.
|
||||
- `/docs/security/console-security.md` - scopes, DPoP, CSP.
|
||||
- `/docs/cli-vs-ui-parity.md` - CLI equivalence matrix.
|
||||
|
||||
---
|
||||
|
||||
## 15. Compliance checklist
|
||||
|
||||
- [ ] Advisory grid columns, filters, and merge hash behaviour documented.
|
||||
- [ ] VEX consensus view covers status badges, provider breakdown, and filters.
|
||||
- [ ] Raw viewer and provenance banners explained with AOC alignment.
|
||||
- [ ] Conflict indicators and explainers tied to aggregation-not-merge rules.
|
||||
- [ ] Verification workflow and CLI parity documented.
|
||||
- [ ] Offline behaviour and automation paths captured.
|
||||
- [ ] Screenshot placeholders and coordination notes recorded.
|
||||
- [ ] References validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
|
||||
130
docs/ui/console-overview.md
Normal file
130
docs/ui/console-overview.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# StellaOps Console – Overview
|
||||
|
||||
> **Audience:** Console product leads, Docs Guild writers, backend/API partners.
|
||||
> **Scope:** Information architecture, tenant scoping, global filters, and Aggregation‑Only Contract (AOC) alignment for the unified StellaOps Console that lands with Sprint 23.
|
||||
|
||||
The StellaOps Console is the single entry point for operators to explore SBOMs, advisories, policies, runs, and administrative surfaces. This overview explains how the console is organised, how users move between tenants, and how shared filters keep data views consistent across modules while respecting AOC boundaries.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Mission & Principles
|
||||
|
||||
- **Deterministic navigation.** Every route is stable and deep-link friendly. URLs carry enough context (tenant, filter tokens, view modes) to let operators resume work without reapplying filters.
|
||||
- **Tenant isolation first.** Any cross-tenant action requires fresh authority, and cross-tenant comparisons are made explicit so users never accidentally mix data sets.
|
||||
- **Aggregation-not-merge UX.** Console surfaces advisory and VEX rollups exactly as produced by Concelier and Excititor—no client-side re-weighting or mutation.
|
||||
- **Offline parity.** Every view has an offline equivalent powered by Offline Kit bundles or cached data, and exposes the staleness budget prominently.
|
||||
|
||||
---
|
||||
|
||||
## 2 · Information Architecture
|
||||
|
||||
### 2.1 Primary navigation
|
||||
|
||||
```
|
||||
Console Root
|
||||
├─ Dashboard # KPIs, alerts, feed age, queue depth
|
||||
├─ Findings # Aggregated vulns + explanations (Policy Engine)
|
||||
├─ SBOM Explorer # Catalog, component graph, overlays
|
||||
├─ Advisories & VEX # Concelier / Excititor aggregation outputs
|
||||
├─ Runs # Scheduler runs, scan evidence, retry controls
|
||||
├─ Policies # Editor, simulations, approvals
|
||||
├─ Downloads # Signed artifacts, Offline Kit parity
|
||||
├─ Admin # Tenants, roles, tokens, integrations
|
||||
└─ Help & Tours # Contextual docs, guided walkthroughs
|
||||
```
|
||||
|
||||
Routes lazy-load feature shells so the UI can grow without increasing first-paint cost. Each feature owns its sub-navigation and exposes a `KeyboardShortcuts` modal describing the available accelerators.
|
||||
|
||||
### 2.2 Shared surfaces
|
||||
|
||||
| Surface | Purpose | Notes |
|
||||
|---------|---------|-------|
|
||||
| **Top bar** | Shows active tenant, environment badge (prod/non-prod), offline status pill, user menu, notifications inbox, and the command palette trigger (`⌘/Ctrl K`). | Offline status turns amber when data staleness exceeds configured thresholds. |
|
||||
| **Global filter tray** | Expands from the right edge (`Shift F`). Hosts universal filters (tenant, time window, tags, severity) that apply across compatible routes. | Filter tray remembers per-tenant presets; stored in IndexedDB (non-sensitive). |
|
||||
| **Context chips** | Display active global filters underneath page titles, with one-click removal (`⌫`). | Chips include the origin (e.g., `Tenant: west-prod`). |
|
||||
| **Status ticker** | SSE-driven strip that surfaces Concelier/Excititor ingestion deltas, scheduler lag, and attestor queue depth. | Pulls from `/console/status` proxy (see WEB-CONSOLE-23-002). |
|
||||
|
||||
---
|
||||
|
||||
## 3 · Tenant Model
|
||||
|
||||
| Aspect | Detail |
|
||||
|--------|--------|
|
||||
| **Tenant sources** | The console obtains the tenant list and metadata from Authority `/v1/tenants` after login. Tenant descriptors include display name, slug, environment tag, and RBAC hints (role mask). |
|
||||
| **Selection workflow** | First visit prompts for a default tenant. Afterwards, the tenant picker (`⌘/Ctrl T`) switches context without full reload, issuing `Authorization` refresh with the new tenant scope. |
|
||||
| **Token handling** | Each tenant change generates a short-lived, DPoP-bound access token (`aud=console`, `tenant=<id>`). Tokens live in memory; metadata persists in `sessionStorage` for reload continuity. |
|
||||
| **Cross-tenant comparisons** | Side-by-side dashboards (Dashboard, Findings, SBOM Explorer) allow multi-tenant comparison only via explicit *"Add tenant"* control. Requests issue parallel API calls with separate tokens; results render in split panes labelled per tenant. |
|
||||
| **Fresh-auth gated actions** | Admin and policy approvals call `Authority /fresh-auth` before executing. UI enforces a 5-minute window; afterwards, actions remain visible but disabled pending re-auth. |
|
||||
| **Audit trail** | Tenant switches emit structured logs (`action=ui.tenant.switch`, `tenantId`, `subject`, `previousTenant`) and appear in Authority audit exports. |
|
||||
|
||||
### 3.1 Offline operation
|
||||
|
||||
In offline or sealed environments, the tenant picker only lists tenants bundled within the Offline Kit snapshot. Switching tenants prompts an "offline snapshot" banner showing the snapshot timestamp. Actions that require round-trips to Authority (fresh-auth, token rotation) show guidance to perform the step on an online bastion and import credentials later.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Global Filters & Context Tokens
|
||||
|
||||
| Filter | Applies To | Source & Behaviour |
|
||||
|--------|------------|--------------------|
|
||||
| **Tenant** | All modules | Primary isolation control. Stored in URL (`?tenant=`) and via `x-tenant-id` header injected by the web proxy. Changes invalidate cached data stores. |
|
||||
| **Time window** | Dashboard, Findings, Advisories & VEX, Runs | Options: `24 h`, `7 d`, `30 d`, custom ISO range. Default aligns with Compliance/Authority reporting window. Shared via query param `since=`/`until=`. |
|
||||
| **Severity / Impact** | Findings, Advisories & VEX, SBOM Explorer overlays | Multi-select (Critical/High/Medium/Low/Informational, plus `Exploited` tag). Values map to Policy Engine impact buckets and Concelier KEV flags. |
|
||||
| **Component tags** | SBOM Explorer, Findings | Tags drawn from SBOM metadata (`component.tags[]`). Includes search-as-you-type with scoped suggestions (package type, supplier, license). |
|
||||
| **Source providers** | Advisories & VEX | Filter by provider IDs (e.g., NVD, GHSA, vendor VEX). Tied to Aggregation-Only provenance; filtering never alters base precedence. |
|
||||
| **Run status** | Runs, Dashboard | States: `queued`, `running`, `completed`, `failed`, `cancelled`. Pulled from Scheduler SSE stream; default shows non-terminal states. |
|
||||
| **Policy view** | Findings, Policies | Toggles between Active policy, Staged policy, and Simulation snapshots. Selecting Simulation requires prior simulation run; console links to create one if absent. |
|
||||
|
||||
Filters emit deterministic tokens placed in the URL hash for copy/paste parity with CLI commands (see `/docs/cli-vs-ui-parity.md`). The console warns when a filter combination has no effect on the current view and offers to reset to defaults.
|
||||
|
||||
### 4.1 Presets & Saved Views
|
||||
|
||||
Users can save a set of global filters as named presets (stored per tenant). Presets show up in the command palette and the dashboard landing cards for quick access (`⌘/Ctrl 1..9`).
|
||||
|
||||
---
|
||||
|
||||
## 5 · Aggregation-Only Alignment
|
||||
|
||||
- **Read-only aggregation.** Pages that list advisories or VEX claims consume the canonical aggregation endpoints (`/console/advisories`, `/console/vex`). They never merge or reconcile records client-side. Instead, they highlight the source lineage and precedence as supplied by Concelier and Excititor.
|
||||
- **Consistency indicators.** Each aggregated item displays source badges, precedence order, and a "last merge event hash" so operators can cross-reference Concelier logs. When a source is missing or stale, the UI surfaces a provenance banner linking to the raw document.
|
||||
- **AOC guardrails.** Workflow actions (e.g., "request verify", "download evidence bundle") route through Concelier WebService guard endpoints that enforce Aggregation-Only rules. UI strings reinforce that policy decisions happen in Policy Engine, not here.
|
||||
- **Audit alignment.** Any cross-navigation from aggregated data into findings or policies preserves the underlying IDs so analysts can track how aggregated data influences policy verdicts without altering the data itself.
|
||||
- **CLI parity.** Inline callouts copy the equivalent `stella` CLI commands, ensuring console users can recreate the exact aggregation query offline.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Performance & Telemetry Anchors
|
||||
|
||||
- Initial boot target: **< 2.5 s** `LargestContentfulPaint` on 4 vCPU air-gapped runner with cached assets.
|
||||
- Route budget: each feature shell must keep first interaction (hydrated data + filters) under **1.5 s** once tokens resolve.
|
||||
- Telemetry: console emits metrics via the `/console/telemetry` batch endpoint—`ui_route_render_seconds`, `ui_filter_apply_total`, `ui_tenant_switch_total`, `ui_offline_banner_seconds`. Logs carry correlation IDs matching backend responses for unified tracing.
|
||||
- Lighthouse CI runs in the console pipeline (see `DEVOPS-CONSOLE-23-001`) and asserts budgets above; failing runs gate releases.
|
||||
|
||||
---
|
||||
|
||||
## 7 · References
|
||||
|
||||
- `/docs/architecture/console.md` – component-level diagrams (pending Sprint 23 task).
|
||||
- `/docs/ui/navigation.md` – detailed routes, breadcrumbs, keyboard shortcuts.
|
||||
- `/docs/ui/downloads.md` – downloads manifest, parity workflows, offline guidance.
|
||||
- `/docs/ui/sbom-explorer.md` – SBOM-specific flows and overlays.
|
||||
- `/docs/ui/advisories-and-vex.md` – aggregation UX details.
|
||||
- `/docs/ui/findings.md` – explain drawer and filter matrix.
|
||||
- `/docs/security/console-security.md` – OIDC, scopes, CSP, evidence handling.
|
||||
- `/docs/cli-vs-ui-parity.md` – CLI equivalents and regression automation.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Compliance Checklist
|
||||
|
||||
- [ ] Tenant picker enforces Authority-issued scopes and logs `ui.tenant.switch`.
|
||||
- [ ] Global filters update URLs/query tokens for deterministic deep links.
|
||||
- [ ] Aggregation views show provenance badges and merge hash indicators.
|
||||
- [ ] CLI parity callouts aligned with `stella` commands for equivalent queries.
|
||||
- [ ] Offline banner tested with Offline Kit snapshot import and documented staleness thresholds.
|
||||
- [ ] Accessibility audit covers global filter tray, tenant picker, and keyboard shortcuts (WCAG 2.2 AA).
|
||||
- [ ] Telemetry and Lighthouse budgets tracked in console CI (`DEVOPS-CONSOLE-23-001`).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
143
docs/ui/console.md
Normal file
143
docs/ui/console.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# Console AOC Dashboard
|
||||
|
||||
> **Audience:** Console PMs, UI engineers, Concelier/Excititor operators, SREs monitoring ingestion health.
|
||||
> **Scope:** Layout, RBAC, workflow, and observability for the Aggregation-Only Contract (AOC) dashboard that ships with Sprint 19.
|
||||
|
||||
The Console AOC dashboard gives operators a live view of ingestion guardrails across all configured sources. It surfaces raw Concelier/Excititor health, highlights violations raised by `AOCWriteGuard`, and lets on-call staff trigger verification without leaving the browser. Use it alongside the [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md) and the [architecture overview](../architecture/overview.md) when rolling out AOC changes.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Access & prerequisites
|
||||
|
||||
- **Route:** `/console/sources` (dashboard) with contextual drawer routes `/console/sources/:sourceKey` and `/console/sources/:sourceKey/violations/:documentId`.
|
||||
- **Feature flag:** `aocDashboard.enabled` (default `true` once Concelier WebService exposes `/aoc/verify`). Toggle is tenant-scoped to support phased rollout.
|
||||
- **Scopes:**
|
||||
- `ui.read` (base navigation) and `advisory:verify` to view ingestion stats/violations.
|
||||
- `vex:verify` to see Excititor entries and run VEX verifications.
|
||||
- `advisory:write` / `vex:write` **not** required; dashboard uses read-only APIs.
|
||||
- **Tenancy:** All data is filtered by the active tenant selector. Switching tenants re-fetches tiles and drill-down tables with tenant-scoped tokens.
|
||||
- **Back-end contracts:** Requires Concelier/Excititor 19.x (AOC guards enabled) and Authority scopes updated per [Authority service docs](../ARCHITECTURE_AUTHORITY.md#new-aoc-scopes).
|
||||
|
||||
---
|
||||
|
||||
## 2 · Layout overview
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Header: tenant picker • live status pill • Last verify (“2h ago”) │
|
||||
├────────────────────────────────────────────────────────────────────────────┤
|
||||
│ Tile grid (4 per row) │
|
||||
│ ┌───── Concelier sources ─────┐ ┌────── Excititor sources ────────┐ │
|
||||
│ │ Red Hat | Ubuntu | OSV ... │ │ Vendor VEX | CSAF feeds ... │ │
|
||||
├────────────────────────────────────────────────────────────────────────────┤
|
||||
│ Violations & history table │
|
||||
│ • Filters: timeframe, source, ERR_AOC code, severity (warning/block) │
|
||||
│ • Columns: timestamp, source, code, summary, supersedes link, actions │
|
||||
├────────────────────────────────────────────────────────────────────────────┤
|
||||
│ Action bar: Run Verify • Download CSV • Open Concelier raw doc • Help │
|
||||
└────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Tiles summarise the latest ingestion runs. The table and drawers provide drill-down views, and the action bar launches verifier workflows or exports evidence for audits.
|
||||
|
||||
---
|
||||
|
||||
## 3 · Source tiles
|
||||
|
||||
Each tile represents a Concelier or Excititor source and contains the fields below.
|
||||
|
||||
| Field | Description | Thresholds & colours |
|
||||
| ------ | ----------- | -------------------- |
|
||||
| **Status badge** | Aggregated health computed from the latest job. | `Healthy` (green) when last job finished < 30 min ago and `violations24h = 0`; `Warning` (amber) when age ≥ 30 min or ≤ 5 violations; `Critical` (red) on any guard rejection (`ERR_AOC_00x`) or if job age > 2 h. |
|
||||
| **Last ingest** | Timestamp and relative age of last successful append to `advisory_raw`/`vex_raw`. | Clicking opens job detail drawer. |
|
||||
| **Violations (24 h)** | Count of guard failures grouped by `ERR_AOC` code across the last 24 hours. | Shows pill per code (e.g., `ERR_AOC_001 × 2`). |
|
||||
| **Supersedes depth** | Average length of supersedes chain for the source over the last day. | Helps spot runaway revisions. |
|
||||
| **Signature pass rate** | % of documents where signature/checksum verification succeeded. | Derived from `ingestion_signature_verified_total`. |
|
||||
| **Latency P95** | Write latency recorded by ingestion spans / histograms. | Mirrors `ingestion_latency_seconds{quantile=0.95}`. |
|
||||
|
||||
Tile menus expose quick actions:
|
||||
|
||||
- **View history** – jumps to table filtered by the selected source.
|
||||
- **Open metrics** – deep links to Grafana panel seeded with `source=<key>` for `ingestion_write_total` and `aoc_violation_total`.
|
||||
- **Download raw sample** – fetches the most recent document via `GET /advisories/raw/{id}` (or VEX equivalent) for debugging.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Violation drill-down workflow
|
||||
|
||||
1. **Select a tile** or use table filters to focus on a source, timeframe, or `ERR_AOC` code.
|
||||
2. **Inspect the violation row:** summary shows offending field, guard code, and document hash.
|
||||
3. **Open detail drawer:** reveals provenance (source URI, signature info), supersedes chain, and raw JSON (redacted secrets). Drawer also lists linked `effective_finding_*` entries if Policy Engine has already materialised overlays.
|
||||
4. **Remediate / annotate:** operators can add notes (stored as structured annotations) or flag as *acknowledged* (for on-call rotations). Annotations sync to Concelier audit logs.
|
||||
5. **Escalate:** “Create incident” button opens the standard incident template pre-filled with context (requires `ui.incidents` scope).
|
||||
|
||||
The drill-down retains filter state, so back navigation returns to the scoped table without reloading the entire dashboard.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Verification & actions
|
||||
|
||||
- **Run Verify:** calls `POST /aoc/verify` with the chosen `since` window (default 24 h). UI displays summary cards (documents checked, violations found, top codes) and stores reports for 7 days. Results include a downloadable JSON manifest mirroring CLI output.
|
||||
- **Schedule verify:** schedule modal configures automated verification (daily/weekly) and optional email/Notifier hooks.
|
||||
- **Export evidence:** CSV/JSON export buttons include tile metrics, verification summaries, and violation annotations—useful for audits.
|
||||
- **Open in CLI:** copies `stella aoc verify --tenant <tenant> --since <window>` for parity with automation scripts.
|
||||
|
||||
All verify actions are scoped by tenant and recorded in Authority audit logs (`action=aoc.verify.ui`).
|
||||
|
||||
---
|
||||
|
||||
## 6 · Metrics & observability
|
||||
|
||||
The dashboard consumes the same metrics emitted by Concelier/Excititor (documented in the [AOC reference](../ingestion/aggregation-only-contract.md#9-observability-and-diagnostics)):
|
||||
|
||||
- `ingestion_write_total{source,tenant,result}` – populates success/error sparklines beneath each tile.
|
||||
- `aoc_violation_total{source,tenant,code}` – feeds violation pills and trend chart.
|
||||
- `ingestion_signature_verified_total{source,result}` – renders signature pass-rate gauge.
|
||||
- `ingestion_latency_seconds{source,quantile}` – used for latency badges and alert banners.
|
||||
- `advisory_revision_count{source}` – displayed in supersedes depth tooltip.
|
||||
|
||||
The page shows the correlation ID for each violation entry, matching structured logs emitted by Concelier and Excititor, enabling quick log pivoting.
|
||||
|
||||
---
|
||||
|
||||
## 7 · Security & tenancy
|
||||
|
||||
- Tokens are DPoP-bound; every API call includes the UI’s DPoP proof and inherits tenant scoping from Authority.
|
||||
- Violations drawer hides sensitive fields (credentials, private keys) using the same redaction rules as Concelier events.
|
||||
- Run Verify honours rate limits to avoid overloading ingestion services; repeated failures trigger a cool-down banner.
|
||||
- The dashboard never exposes derived severity or policy status—only raw ingestion facts and guard results, preserving AOC separation of duties.
|
||||
|
||||
---
|
||||
|
||||
## 8 · Offline & air-gap behaviour
|
||||
|
||||
- In sealed/offline mode the dashboard switches to **“offline snapshot”** banner, reading from Offline Kit snapshots seeded via `ouk` imports.
|
||||
- Verification requests queue until connectivity resumes; UI provides `Download script` to run `stella aoc verify` on a workstation and upload results later.
|
||||
- Tiles display the timestamp of the last imported snapshot and flag when it exceeds the configured staleness threshold (default 48 h offline).
|
||||
- CSV/JSON exports include checksums so operators can transfer evidence across air gaps securely.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Related references
|
||||
|
||||
- [Aggregation-Only Contract reference](../ingestion/aggregation-only-contract.md)
|
||||
- [Architecture overview](../architecture/overview.md)
|
||||
- [Concelier architecture](../ARCHITECTURE_CONCELIER.md)
|
||||
- [Excititor architecture](../ARCHITECTURE_EXCITITOR.md)
|
||||
- [CLI AOC commands](../cli/cli-reference.md)
|
||||
|
||||
---
|
||||
|
||||
## 10 · Compliance checklist
|
||||
|
||||
- [ ] Dashboard wired to live AOC metrics (`ingestion_*`, `aoc_violation_total`).
|
||||
- [ ] Verify action logs to Authority audit trail with tenant context.
|
||||
- [ ] UI enforces read-only access to raw stores; no mutation endpoints invoked.
|
||||
- [ ] Offline/air-gap mode documented and validated with Offline Kit snapshots.
|
||||
- [ ] Violation exports include provenance and `ERR_AOC_00x` codes.
|
||||
- [ ] Accessibility tested (WCAG 2.2 AA) for tiles, tables, and drawers.
|
||||
- [ ] Screenshot/recording captured for Docs release notes (pending UI capture).
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 19).*
|
||||
212
docs/ui/downloads.md
Normal file
212
docs/ui/downloads.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# StellaOps Console - Downloads Manager
|
||||
|
||||
> **Audience:** DevOps guild, Console engineers, enablement writers, and operators who promote releases or maintain offline mirrors.
|
||||
> **Scope:** `/console/downloads` workspace covering artifact catalog, signed manifest plumbing, export status handling, CLI parity, automation hooks, and offline guidance (Sprint 23).
|
||||
|
||||
The Downloads workspace centralises every artefact required to deploy or validate StellaOps in connected and air-gapped environments. It keeps Console operators aligned with release engineering by surfacing the signed downloads manifest, live export jobs, parity checks against Offline Kit bundles, and automation hooks that mirror the CLI experience.
|
||||
|
||||
---
|
||||
|
||||
## 1 - Access and prerequisites
|
||||
|
||||
- **Route:** `/console/downloads` (list) with detail drawer `/console/downloads/:artifactId`.
|
||||
- **Scopes:** `downloads.read` (baseline) and `downloads.manage` for cancelling or expiring stale exports. Evidence bundles inherit the originating scope (`runs.read`, `findings.read`, etc.).
|
||||
- **Dependencies:** Web gateway `/console/downloads` API (WEB-CONSOLE-23-005), DevOps manifest pipeline (`deploy/downloads/manifest.json`), Offline Kit metadata (`manifest/offline-manifest.json`), and export orchestrator `/console/exports`.
|
||||
- **Feature flags:** `downloads.workspace.enabled`, `downloads.exportQueue`, `downloads.offlineParity`.
|
||||
- **Tenancy:** Artefacts are tenant-agnostic except evidence bundles, which are tagged with originating tenant and require matching Authority scopes.
|
||||
|
||||
---
|
||||
|
||||
## 2 - Workspace layout
|
||||
|
||||
```
|
||||
+---------------------------------------------------------------+
|
||||
| Header: Snapshot timestamp - Manifest signature status |
|
||||
+---------------------------------------------------------------+
|
||||
| Cards: Latest release - Offline kit parity - Export queue |
|
||||
+---------------------------------------------------------------+
|
||||
| Tabs: Artefacts | Exports | Offline Kits | Webhooks |
|
||||
+---------------------------------------------------------------+
|
||||
| Filter bar: Channel - Kind - Architecture - Scope tags |
|
||||
+---------------------------------------------------------------+
|
||||
| Table (virtualised): Artifact | Channel | Digest | Status |
|
||||
| Detail drawer: Metadata | Commands | Provenance | History |
|
||||
+---------------------------------------------------------------+
|
||||
```
|
||||
|
||||
- **Snapshot banner:** shows `manifest.version`, `generatedAt`, and cosign verification state. If verification fails, the banner turns red and links to troubleshooting guidance.
|
||||
- **Quick actions:** Copy manifest URL, download attestation bundle, trigger parity check, open CLI parity doc (`/docs/cli-vs-ui-parity.md`).
|
||||
- **Filters:** allow narrowing by channel (`edge`, `stable`, `airgap`), artefact kind (`container.image`, `helm.chart`, `compose.bundle`, `offline.bundle`, `export.bundle`), architecture (`linux/amd64`, `linux/arm64`), and scope tags (`console`, `scheduler`, `authority`).
|
||||
|
||||
---
|
||||
|
||||
## 3 - Artefact catalogue
|
||||
|
||||
| Category | Artefacts surfaced | Source | Notes |
|
||||
|----------|-------------------|--------|-------|
|
||||
| **Core containers** | `stellaops/web-ui`, `stellaops/web`, `stellaops/concelier`, `stellaops/excititor`, `stellaops/scanner-*`, `stellaops/authority`, `stellaops/attestor`, `stellaops/scheduler-*` | `deploy/downloads/manifest.json` (`artifacts[].kind = "container.image"`) | Digest-only pulls with copy-to-clipboard `docker pull` and `oras copy` commands; badges show arch availability. |
|
||||
| **Helm charts** | `deploy/helm/stellaops-*.tgz` plus values files | Manifest entries where `kind = "helm.chart"` | Commands reference `helm repo add` (online) and `helm install --values` (offline). UI links to values matrix in `/docs/install/helm-prod.md` when available. |
|
||||
| **Compose bundles** | `deploy/compose/docker-compose.*.yaml`, `.env` seeds | `kind = "compose.bundle"` | Inline diff viewer highlights digest changes vs previous snapshot; `docker compose pull` command copies digest pins. |
|
||||
| **Offline kit** | `stella-ops-offline-kit-<ver>-<channel>.tar.gz` + signatures and manifest | Offline Kit metadata (`manifest/offline-manifest.json`) merged into downloads view | Drawer shows bundle size, signed manifest digest, cosign verification command (mirrors `/docs/24_OFFLINE_KIT.md`). |
|
||||
| **Evidence exports** | Completed jobs from `/console/exports` (findings delta, policy explain, run evidence) | Export orchestrator job queue | Entries expire after retention window; UI exposes `stella runs export` and `stella findings export` parity buttons. |
|
||||
| **Webhooks & parity** | `/downloads/hooks/subscribe` configs, CI parity reports | Manifest extras (`kind = "webhook.config"`, `kind = "parity.report"`) | Operators can download webhook payload templates and review the latest CLI parity check report generated by docs CI. |
|
||||
|
||||
---
|
||||
|
||||
## 4 - Manifest structure
|
||||
|
||||
The DevOps pipeline publishes a deterministic manifest at `deploy/downloads/manifest.json`, signed with the release Cosign key (`DOWNLOADS-CONSOLE-23-001`). The Console fetches it on workspace load and caches it with `If-None-Match` headers to avoid redundant pulls. The manifest schema:
|
||||
|
||||
- **`version`** - monotonically increasing integer tied to pipeline run.
|
||||
- **`generatedAt`** - ISO-8601 UTC timestamp.
|
||||
- **`signature`** - URL to detached Cosign signature (`manifest.json.sig`).
|
||||
- **`artifacts[]`** - ordered list keyed by `id`.
|
||||
|
||||
Each artefact contains:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `id` | Stable identifier (`<type>:<name>:<version>`). |
|
||||
| `kind` | One of `container.image`, `helm.chart`, `compose.bundle`, `offline.bundle`, `export.bundle`, `webhook.config`, `parity.report`. |
|
||||
| `channel` | `edge`, `stable`, or `airgap`. |
|
||||
| `version` | Semantic or calendar version (for containers, matches release manifest). |
|
||||
| `architectures` | Array of supported platforms (empty for arch-agnostic artefacts). |
|
||||
| `digest` | SHA-256 for immutable artefacts; Compose bundles include file hash. |
|
||||
| `sizeBytes` | File size (optional for export bundles that stream). |
|
||||
| `downloadUrl` | HTTPS endpoint (registry, object store, or mirror). |
|
||||
| `signatureUrl` | Detached signature (Cosign, DSSE, or attestation) if available. |
|
||||
| `sbomUrl` | Optional SBOM pointer (CycloneDX JSON). |
|
||||
| `attestationUrl` | Optional in-toto/SLSA attestation. |
|
||||
| `docs` | Array of documentation links (e.g., `/docs/install/docker.md`). |
|
||||
| `tags` | Free-form tags (e.g., `["console","ui","offline"]`). |
|
||||
|
||||
### 4.1 Example excerpt
|
||||
|
||||
```json
|
||||
{
|
||||
"version": 42,
|
||||
"generatedAt": "2025-10-27T04:00:00Z",
|
||||
"signature": "https://downloads.stella-ops.org/manifest/manifest.json.sig",
|
||||
"artifacts": [
|
||||
{
|
||||
"id": "container.image:web-ui:2025.10.0-edge",
|
||||
"kind": "container.image",
|
||||
"channel": "edge",
|
||||
"version": "2025.10.0-edge",
|
||||
"architectures": ["linux/amd64", "linux/arm64"],
|
||||
"digest": "sha256:38b225fa7767a5b94ebae4dae8696044126aac429415e93de514d5dd95748dcf",
|
||||
"sizeBytes": 187563210,
|
||||
"downloadUrl": "https://registry.stella-ops.org/v2/stellaops/web-ui/manifests/sha256:38b225fa7767a5b94ebae4dae8696044126aac429415e93de514d5dd95748dcf",
|
||||
"signatureUrl": "https://downloads.stella-ops.org/signatures/web-ui-2025.10.0-edge.cosign.sig",
|
||||
"sbomUrl": "https://downloads.stella-ops.org/sbom/web-ui-2025.10.0-edge.cdx.json",
|
||||
"attestationUrl": "https://downloads.stella-ops.org/attestations/web-ui-2025.10.0-edge.intoto.jsonl",
|
||||
"docs": ["/docs/install/docker.md", "/docs/security/console-security.md"],
|
||||
"tags": ["console", "ui"]
|
||||
},
|
||||
{
|
||||
"id": "offline.bundle:ouk:2025.10.0-edge",
|
||||
"kind": "offline.bundle",
|
||||
"channel": "edge",
|
||||
"version": "2025.10.0-edge",
|
||||
"digest": "sha256:4f7d2f7a8d0cf4b5f3af689f6c74cd213f4c1b3a1d76d24f6f9f3d9075e51f90",
|
||||
"downloadUrl": "https://downloads.stella-ops.org/offline/stella-ops-offline-kit-2025.10.0-edge.tar.gz",
|
||||
"signatureUrl": "https://downloads.stella-ops.org/offline/stella-ops-offline-kit-2025.10.0-edge.tar.gz.sig",
|
||||
"sbomUrl": "https://downloads.stella-ops.org/offline/offline-manifest-2025.10.0-edge.json",
|
||||
"docs": ["/docs/24_OFFLINE_KIT.md"],
|
||||
"tags": ["offline", "airgap"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Console caches the manifest hash and surfaces differences when a new version lands, helping operators confirm digests drift only when expected.
|
||||
|
||||
---
|
||||
|
||||
## 5 - Download workflows and statuses
|
||||
|
||||
| Status | Applies to | Behaviour |
|
||||
|--------|------------|-----------|
|
||||
| **Ready** | Immutable artefacts (images, Helm/Compose bundles, offline kit) | Commands available immediately. Digest, size, and last verification timestamp display in the table. |
|
||||
| **Pending export** | Async exports queued via `/console/exports` | Shows job owner, scope, and estimated completion time. UI polls every 15 s and updates progress bar. |
|
||||
| **Processing** | Long-running export (evidence bundle, large SBOM) | Drawer shows current stage (`collecting`, `compressing`, `signing`). Operators can cancel if they own the request and hold `downloads.manage`. |
|
||||
| **Delivered** | Completed export within retention window | Provides download links, resume token, and parity snippet for CLI. |
|
||||
| **Expired** | Export past retention or manually expired | Row grays out; clicking opens housekeeping guidance with CLI command to regenerate (`stella runs export --run <id>`). |
|
||||
|
||||
Exports inherit retention defaults defined in policy (`downloads.retentionDays`, min 3, max 30). Operators can override per tenant if they have the appropriate scope.
|
||||
|
||||
---
|
||||
|
||||
## 6 - CLI parity and copy-to-clipboard
|
||||
|
||||
- **Digest pulls:** Each container entry exposes `docker pull <image>@<digest>` and `oras copy <image>@<digest> --to-dir ./downloads` buttons. Commands include architecture hints for multi-platform images.
|
||||
- **Helm/Compose:** Buttons output `helm pull` / `helm install` with the manifest URL and `docker compose --env-file` commands referencing the downloaded bundle.
|
||||
- **Offline kit:** Copy buttons produce the full verification sequence:
|
||||
|
||||
```bash
|
||||
curl -LO https://downloads.stella-ops.org/offline/stella-ops-offline-kit-2025.10.0-edge.tar.gz
|
||||
curl -LO https://downloads.stella-ops.org/offline/stella-ops-offline-kit-2025.10.0-edge.tar.gz.sig
|
||||
cosign verify-blob \
|
||||
--key https://stella-ops.org/keys/cosign.pub \
|
||||
--signature stella-ops-offline-kit-2025.10.0-edge.tar.gz.sig \
|
||||
stella-ops-offline-kit-2025.10.0-edge.tar.gz
|
||||
```
|
||||
|
||||
- **Exports:** Drawer lists CLI equivalents (for example, `stella findings export --run <id>`). When the CLI supports resume tokens, the command includes `--resume-token` from the manifest entry.
|
||||
- **Automation:** Webhook tab copies `curl` snippets to subscribe to `/downloads/hooks/subscribe?topic=<artifact>` and includes payload schema for integration tests.
|
||||
|
||||
Parity buttons write commands to the clipboard and display a toast confirming scope hints (for example, `Requires downloads.read + tenant scope`). Accessibility shortcuts (`Shift+D`) trigger the primary copy action for keyboard users.
|
||||
|
||||
---
|
||||
|
||||
## 7 - Offline and air-gap workflow
|
||||
|
||||
- **Manifest sync:** Offline users download `manifest/offline-manifest.json` plus detached JWS and import it via `stella offline kit import`. Console highlights if the offline manifest predates the online manifest by more than 7 days.
|
||||
- **Artefact staging:** The workspace enumerates removable media instructions (export to `./staging/<channel>/`) and warns when artefacts exceed configured media size thresholds.
|
||||
- **Mirrors:** Buttons copy `oras copy` commands that mirror images to an internal registry (`registry.<tenant>.internal`). Operators can toggle `--insecure-policy` if the destination uses custom trust roots.
|
||||
- **Parity checks:** `downloads.offlineParity` flag surfaces the latest parity report verifying that Offline Kit contents match the downloads manifest digests. If diff detected, UI raises a banner linking to remediation steps.
|
||||
- **Audit logging:** Every download command triggered from the UI emits `ui.download.commandCopied` with artifact ID, digest, and tenant. Logs feed the evidence locker so air-gap imports can demonstrate provenance.
|
||||
|
||||
---
|
||||
|
||||
## 8 - Observability and quotas
|
||||
|
||||
| Signal | Source | Description |
|
||||
|--------|--------|-------------|
|
||||
| `ui_download_manifest_refresh_seconds` | Console metrics | Measures time to fetch and verify manifest. Targets < 3 s. |
|
||||
| `ui_download_export_queue_depth` | `/console/downloads` API | Number of pending exports (per tenant). Surfaces as card and Grafana panel. |
|
||||
| `ui_download_command_copied_total` | Console logs | Count of copy actions by artifact type, used to gauge CLI parity adoption. |
|
||||
| `downloads.export.duration` | Export orchestrator | Duration histograms for bundle generation; alerts if P95 > 60 s. |
|
||||
| `downloads.quota.remaining` | Authority quota service | Anonymous users limited to 33 exports/day, verified users 333/day. Banner turns amber at 90 % usage as per platform policy. |
|
||||
|
||||
Telemetry entries include correlation IDs that match backend manifest refresh logs and export job records to keep troubleshooting deterministic.
|
||||
|
||||
---
|
||||
|
||||
## 9 - References
|
||||
|
||||
- `/docs/ui/console-overview.md` - primary shell, tenant controls, SSE ticker.
|
||||
- `/docs/ui/navigation.md` - route ownership and keyboard shortcuts.
|
||||
- `/docs/ui/sbom-explorer.md` - export flows feeding the downloads queue.
|
||||
- `/docs/ui/runs.md` - evidence bundle integration.
|
||||
- `/docs/24_OFFLINE_KIT.md` - offline kit packaging and verification.
|
||||
- `/docs/security/console-security.md` - scopes, CSP, and download token handling (pending).
|
||||
- `/docs/cli-vs-ui-parity.md` - CLI equivalence checks (pending).
|
||||
- `deploy/releases/*.yaml` - source of container digests mirrored into the manifest.
|
||||
|
||||
---
|
||||
|
||||
## 10 - Compliance checklist
|
||||
|
||||
- [ ] Manifest schema documented (fields, signature, caching) and sample kept current.
|
||||
- [ ] Artefact categories mapped to manifest entries and parity workflows.
|
||||
- [ ] Download statuses, retention, and cancellation rules explained.
|
||||
- [ ] CLI copy-to-clipboard commands mirror console actions with scope hints.
|
||||
- [ ] Offline/air-gap parity workflow, mirror commands, and audit logging captured.
|
||||
- [ ] Observability metrics and quota signalling documented.
|
||||
- [ ] References cross-linked to adjacent docs (navigation, exports, offline kit).
|
||||
- [ ] Accessibility shortcuts and copy-to-clipboard behaviour noted with compliance reminder.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-27 (Sprint 23).*
|
||||
179
docs/ui/findings.md
Normal file
179
docs/ui/findings.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# StellaOps Console - Findings
|
||||
|
||||
> **Audience:** Policy Guild, Console UX team, security analysts, customer enablement.
|
||||
> **Scope:** Findings list UX, filters, saved views, explain drawer, exports, CLI parity, real-time updates, and offline considerations for Sprint 23.
|
||||
|
||||
The Findings workspace visualises materialised policy verdicts produced by the Policy Engine. It lets analysts triage affected components, inspect explain traces, compare policy views, and export evidence while respecting Aggregation-Only guardrails.
|
||||
|
||||
---
|
||||
|
||||
## 1. Access and prerequisites
|
||||
|
||||
- **Route:** `/console/findings` with optional panel parameters (e.g., `/console/findings?panel=explain&finding=<id>`).
|
||||
- **Scopes:** `findings.read` (list), `policy:runs` (view run metadata), `policy:simulate` (stage simulations), `downloads.read` (export bundles).
|
||||
- **Prerequisites:** Policy Engine v2 (`policy_run` and `effective_finding_*` endpoints), Concelier/Excititor feeds for provenance, SBOM Service for component metadata.
|
||||
- **Feature flags:** `findings.explain.enabled`, `findings.savedViews.enabled`, `findings.simulationDiff.enabled`.
|
||||
- **Tenancy:** All queries include tenant context; cross-tenant comparisons require explicit admin scope and render split-pane view.
|
||||
|
||||
---
|
||||
|
||||
## 2. Layout overview
|
||||
|
||||
```
|
||||
+-------------------------------------------------------------------+
|
||||
| Header: Tenant badge - policy selector - global filters - actions |
|
||||
+-------------------------------------------------------------------+
|
||||
| Top row cards: Affected assets - Critical count - KEV count |
|
||||
+-------------------------------------------------------------------+
|
||||
| Findings grid (virtualised) |
|
||||
| Columns: Status | Severity | Component | Policy | Source | Age |
|
||||
| Row badges: KEV, Quieted, Override, Simulation only |
|
||||
+-------------------------------------------------------------------+
|
||||
| Right drawer / full view tabs: Summary | Explain | Evidence | Run |
|
||||
+-------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
The policy selector includes Active, Staged, and Simulation snapshots. Switching snapshots triggers diff banners to highlight changes.
|
||||
|
||||
---
|
||||
|
||||
## 3. Filters and saved views
|
||||
|
||||
| Filter | Description | Notes |
|
||||
|--------|-------------|-------|
|
||||
| **Status** | `affected`, `at_risk`, `quieted`, `fixed`, `not_applicable`, `mitigated`. | Status definitions align with Policy Engine taxonomy. |
|
||||
| **Severity** | Critical, High, Medium, Low, Informational, Untriaged. | Derived from policy scoring; UI displays numeric score tooltip. |
|
||||
| **KEV** | Toggle to show only Known Exploited Vulnerabilities. | Pulls from Concelier enrichment. |
|
||||
| **Policy** | Active, Staged, Simulation snapshots. | Simulation requires recent run; otherwise greyed out. |
|
||||
| **Component** | PURL or substring search. | Autocomplete draws from current tenant findings. |
|
||||
| **SBOM** | Filter by image digest or SBOM ID. | Includes quick links to SBOM Explorer. |
|
||||
| **Tag** | Team or environment tags emitted by Policy Engine (`tags[]`). | Supports multi-select. |
|
||||
| **Run window** | `Last 24h`, `Last 7d`, `Custom range`. | Applies to run timestamp. |
|
||||
| **Explain hints** | Filter by explain artefact (rule ID, justification, VEX provider). | Uses server-side filter parameters. |
|
||||
|
||||
Saved views persist filter combinations per tenant and policy. Users can mark views as shared; shared views appear in the left rail with owner and last updated timestamp. Keyboard shortcuts align with global presets (`Cmd+1-9 / Ctrl+1-9`).
|
||||
|
||||
---
|
||||
|
||||
## 4. Findings grid
|
||||
|
||||
| Column | Details |
|
||||
|--------|---------|
|
||||
| **Status** | Badge with tooltip describing resolution path (e.g., "Affected - blocked by policy rule R-105"). Quieted findings show a muted badge with expiry. |
|
||||
| **Severity** | Numeric score and label. Hover reveals scoring formula and evidence sources. |
|
||||
| **Component** | PURL plus human-friendly name. Includes SBOM badge linking to SBOM Explorer detail. |
|
||||
| **Policy** | Policy name + revision digest; clicking opens policy diff in new tab. |
|
||||
| **Source signals** | Icons for VEX, Advisory, Runtime overlays. Hover shows counts and last updated timestamps. |
|
||||
| **Age** | Time since finding was last evaluated; colour-coded when exceeding SLA. |
|
||||
|
||||
Row indicators:
|
||||
|
||||
- **KEV** badge when Concelier marks the vulnerability as exploited.
|
||||
- **Override** badge when policy override or exemption applied.
|
||||
- **Simulation only** badge when viewing simulation snapshot; warns that finding is not yet active.
|
||||
- **Determinism alert** icon if latest run reported a determinism mismatch (links to run detail).
|
||||
|
||||
Bulk actions (multi-select):
|
||||
|
||||
- `Open explains` (launch explain drawer for up to 10 findings).
|
||||
- `Export CSV/JSON`.
|
||||
- `Copy CLI` commands for batch explains (`stella findings explain --batch file`).
|
||||
- `Create ticket` (integrates with integrations configured under Admin).
|
||||
|
||||
---
|
||||
|
||||
## 5. Explain drawer
|
||||
|
||||
Tabs inside the explain drawer:
|
||||
|
||||
1. **Summary** - status, severity, policy decision, rule ID, last evaluated timestamp, SBOM link, run ID.
|
||||
2. **Rule chain** - ordered list of policy rules triggered; each entry shows rule ID, name, action (block/warn/quiet), score contribution, and condition snippet.
|
||||
3. **Evidence** - references to Concelier advisories, Excititor consensus, runtime signals, and overrides. Evidence entries link to their respective explorers.
|
||||
4. **VEX impact** - table of VEX claims considered; displays provider, status, justification, acceptance (accepted/ignored), weight.
|
||||
5. **History** - timeline of state transitions (affected -> quieted -> mitigated) with timestamps and operators (if override applied).
|
||||
6. **Raw trace** - canonical JSON trace from Policy Engine (read-only). CLI parity snippet:
|
||||
- `stella findings explain --policy <id> --finding <key> --format json`.
|
||||
|
||||
Explain drawer includes copy-to-clipboard buttons for rule chain and evidence JSON to support audit workflows. When sealed mode is active, a banner highlights which evidence was sourced from cached data.
|
||||
|
||||
---
|
||||
|
||||
## 6. Simulations and comparisons
|
||||
|
||||
- Simulation toggle lets analysts compare Active vs Staged/Sandbox policies.
|
||||
- Diff banner summarises added, removed, and changed findings.
|
||||
- Side-by-side view shows baseline vs simulation verdicts with change badges (`added`, `removed`, `severity up`, `severity down`).
|
||||
- CLI parity callout: `stella policy simulate --policy <id> --sbom <sbomId> --format diff`.
|
||||
- Simulation results persist for 7 days; stale simulations prompt re-run recommendation.
|
||||
|
||||
---
|
||||
|
||||
## 7. Exports and automation
|
||||
|
||||
- Immediate exports: CSV, JSON, Markdown summary for selected findings.
|
||||
- Scheduled exports: asynchronous job to generate full tenant report (JSON + CSV) with manifest digests.
|
||||
- Explain bundle export packages traces for a set of findings; includes manifest and hash for offline review.
|
||||
- CLI parity:
|
||||
- `stella findings ls --policy <id> --format json --output findings.json`
|
||||
- `stella findings export --policy <id> --format csv --output findings.csv`
|
||||
- `stella findings explain --batch batch.txt --output explains/`
|
||||
- Automation: webhook copy button for `/downloads/hooks/subscribe?topic=findings.report.ready`.
|
||||
|
||||
---
|
||||
|
||||
## 8. Real-time updates and observability
|
||||
|
||||
- SSE channel `/console/findings/stream` pushes new findings, status changes, and quieted expirations; UI animates affected rows.
|
||||
- Header cards show metrics: `findings_critical_total`, `findings_quieted_total`, `findings_kev_total`.
|
||||
- Run ticker lists latest policy runs with status, duration, determinism hash.
|
||||
- Error banners include correlation IDs linking to Policy Engine run logs.
|
||||
- Metrics drill-down links to dashboards (OpenTelemetry, Prometheus).
|
||||
|
||||
---
|
||||
|
||||
## 9. Offline and air-gap behaviour
|
||||
|
||||
- Offline banner indicates snapshot ID and timestamp used for findings.
|
||||
- Explain drawer notes when evidence references offline bundles; suggests importing updated advisories/VEX to refresh results.
|
||||
- Exports default to local storage paths; UI provides manual transfer instructions.
|
||||
- CLI examples switch to include `--sealed` or `--offline` flags.
|
||||
- Tenant selector hides tenants without corresponding offline findings data to avoid partial views.
|
||||
|
||||
---
|
||||
|
||||
## 10. Screenshot coordination
|
||||
|
||||
- Placeholders:
|
||||
- ``
|
||||
- ``
|
||||
- Coordinate with Console Guild (Slack `#console-screenshots`, entry 2025-10-26) to capture updated light and dark theme shots before release.
|
||||
|
||||
---
|
||||
|
||||
## 11. References
|
||||
|
||||
- `/docs/ui/console-overview.md` - shell, filters, tenant model.
|
||||
- `/docs/ui/navigation.md` - route list, deep-link schema.
|
||||
- `/docs/ui/advisories-and-vex.md` - advisory and VEX context feeding findings.
|
||||
- `/docs/ui/policies.md` (pending) - editor and policy lifecycle.
|
||||
- `/docs/policy/overview.md` - Policy Engine outputs.
|
||||
- `/docs/policy/runs.md` - run orchestration.
|
||||
- `/docs/cli/policy.md` - CLI parity for findings commands.
|
||||
|
||||
---
|
||||
|
||||
## 12. Compliance checklist
|
||||
|
||||
- [ ] Filters and saved view behaviour documented with CLI alignment.
|
||||
- [ ] Findings grid columns, badges, and bulk actions captured.
|
||||
- [ ] Explain drawer walkthrough includes rule chain, evidence, and raw trace.
|
||||
- [ ] Simulation diff behaviour and CLI callouts described.
|
||||
- [ ] Exports (immediate and scheduled) plus webhook integration covered.
|
||||
- [ ] Real-time updates, metrics, and error correlation documented.
|
||||
- [ ] Offline behaviour and screenshot coordination noted.
|
||||
- [ ] References validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
|
||||
163
docs/ui/navigation.md
Normal file
163
docs/ui/navigation.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# StellaOps Console - Navigation
|
||||
|
||||
> **Audience:** Console UX writers, UI engineers, QA, and enablement teams.
|
||||
> **Scope:** Primary route map, layout conventions, keyboard shortcuts, deep-link patterns, and tenant context switching for the StellaOps Console (Sprint 23).
|
||||
|
||||
The navigation framework keeps Console workflows predictable across tenants and deployment modes. This guide explains how the global shell, feature routes, and context tokens cooperate so operators can jump between findings, SBOMs, advisories, policies, and runs without losing scope.
|
||||
|
||||
---
|
||||
|
||||
## 1. Information Architecture
|
||||
|
||||
### 1.1 Primary routes
|
||||
|
||||
| Route pattern | Module owner | Purpose | Required scopes (minimum) | Core services |
|
||||
|---------------|--------------|---------|---------------------------|---------------|
|
||||
| `/console/dashboard` | Web gateway | Landing KPIs, feed age, queue depth, alerts | `ui.read` | Web, Scheduler WebService, Concelier WebService, Excititor WebService |
|
||||
| `/console/findings` | Policy Engine | Aggregated findings, explain drawer, export | `findings.read` | Policy Engine, Concelier WebService, SBOM Service |
|
||||
| `/console/sbom` | SBOM Service | Catalog view, component graph, overlays | `sbom.read` | SBOM Service, Policy Engine (overlays) |
|
||||
| `/console/advisories` | Concelier | Advisory aggregation with provenance banners | `advisory.read` | Concelier WebService |
|
||||
| `/console/vex` | Excititor | VEX aggregation, consensus, conflicts | `vex.read` | Excititor WebService |
|
||||
| `/console/runs` | Scheduler | Run list, live progress, evidence downloads | `runs.read` | Scheduler WebService, Policy Engine, Scanner WebService |
|
||||
| `/console/policies` | Policy Engine | Editor, simulations, approvals | `policy.read` (read) / `policy.write` (edit) | Policy Engine, Authority |
|
||||
| `/console/downloads` | DevOps | Signed artifacts, Offline Kit parity checklist | `downloads.read` | DevOps manifest API, Offline Kit |
|
||||
| `/console/admin` | Authority | Tenants, roles, tokens, integrations | `ui.admin` (plus scoped `authority:*`) | Authority |
|
||||
| `/console/help` | Docs Guild | Guides, tours, release notes | `ui.read` | Docs static assets |
|
||||
|
||||
### 1.2 Secondary navigation elements
|
||||
|
||||
- **Left rail:** highlights the active top-level route, exposes quick metrics, and shows pinned saved views. Keyboard focus cycles through rail entries with `Tab`/`Shift+Tab`.
|
||||
- **Breadcrumb bar:** renders `Home / Module / Detail` format. Detail crumbs include IDs and titles for shareable context (for example, `Findings / High Severity / CVE-2025-1234`).
|
||||
- **Action shelf:** right-aligned controls for context actions (export, verify, retry). Buttons disable automatically if the current subject lacks the requisite scope.
|
||||
|
||||
---
|
||||
|
||||
## 2. Command Palette and Search
|
||||
|
||||
- **Trigger:** `Ctrl/Cmd + K`. Palette opens in place, keeps focus, and announces results via ARIA live region.
|
||||
- **Capabilities:** jump to routes, saved views, tenants, recent entities (findings, SBOMs, advisories), and command actions (for example, "Start verification", "Open explain drawer").
|
||||
- **Result tokens:** palette entries carry metadata (`type`, `tenant`, `filters`). Selecting an item updates the URL and applies stored filters without a full reload.
|
||||
- **Offline fallback:** in sealed/offline mode, palette restricts actions to cached routes and saved views; remote-only items show a grayed-out badge.
|
||||
|
||||
---
|
||||
|
||||
## 3. Global Filters and Context Chips
|
||||
|
||||
| Control | Shortcut | Persistence | Notes |
|
||||
|---------|----------|-------------|-------|
|
||||
| **Tenant picker** | `Ctrl/Cmd + T` | SessionStorage + URL `tenant` query | Issues fresh Authority token, invalidates caches, emits `ui.tenant.switch` log. |
|
||||
| **Filter tray** | `Shift + F` | IndexedDB (per tenant) + URL query (`since`, `severity`, `tags`, `source`, `status`, `policyView`) | Applies instantly to compatible routes; incompatible filters show a reset suggestion. |
|
||||
| **Component search** | `/` when filters closed | URL `component` query | Context-aware; scopes results to current tenant and module. |
|
||||
| **Time window** | `Ctrl/Cmd + Shift + 1-4` | URL `since`/`until`, palette preset | Mapped to preset windows: 24 h, 7 d, 30 d, custom. |
|
||||
|
||||
Context chips appear beneath page titles summarising active filters (for example, `Tenant: west-prod`, `Severity: Critical+High`, `Time: Last 7 days`). Removing a chip updates the tray and URL atomically.
|
||||
|
||||
---
|
||||
|
||||
## 4. Keyboard Shortcut Matrix
|
||||
|
||||
| Scope | Shortcut (Mac / Windows) | Action | Notes |
|
||||
|-------|--------------------------|--------|-------|
|
||||
| Global | `Cmd+K / Ctrl+K` | Open command palette | Accessible from any route except modal dialogs. |
|
||||
| Global | `Cmd+T / Ctrl+T` | Open tenant switcher | Requires `ui.read`. Confirm selection with `Enter`; `Esc` cancels without switching. |
|
||||
| Global | `Shift+F` | Toggle global filter tray | Focus lands on first filter control. |
|
||||
| Global | `Cmd+1-9 / Ctrl+1-9` | Load saved view preset | Each preset bound per tenant; non-assigned keys show tooltip. |
|
||||
| Global | `?` | Show keyboard reference overlay | Overlay lists context-specific shortcuts; closes with `Esc`. |
|
||||
| Findings module | `Cmd+/ / Ctrl+/` | Focus explain search | Works when explain drawer is open. |
|
||||
| SBOM module | `Cmd+G / Ctrl+G` | Toggle graph overlays | Persists per session. |
|
||||
| Advisories & VEX | `Cmd+Opt+F / Ctrl+Alt+F` | Focus provider filter | Highlights provider chip strip. |
|
||||
| Runs module | `Cmd+R / Ctrl+R` | Refresh SSE snapshot | Schedules soft refresh (no hard reload). |
|
||||
| Policies module | `Cmd+S / Ctrl+S` | Save draft (if edit rights) | Mirrors Policy Editor behaviour. |
|
||||
|
||||
Shortcut handling follows WCAG 2.2 best practices: all accelerators are remappable via Settings -> Accessibility -> Keyboard shortcuts, and the overlay documents platform differences.
|
||||
|
||||
---
|
||||
|
||||
## 5. Deep-Link Patterns
|
||||
|
||||
### 5.1 URL schema
|
||||
|
||||
Console URLs adopt the format:
|
||||
|
||||
```
|
||||
/console/<route>[/:id][/:tab]?tenant=<slug>&since=<iso>&severity=<list>&view=<token>&panel=<drawer>&component=<purl>
|
||||
```
|
||||
|
||||
- **`tenant`** is mandatory and matches Authority slugs (e.g., `acme-prod`).
|
||||
- **`since` / `until`** use ISO-8601 timestamps (UTC). Preset ranges set only `since`; UI computes `until` on load.
|
||||
- **`severity`** accepts comma-separated policy buckets (e.g., `critical,high,kev`).
|
||||
- **`view`** stores module-specific state (e.g., `sbomView=usage`, `findingsPreset=threat-hunting`).
|
||||
- **`panel`** selects drawers or tabs (`panel=explain`, `panel=timeline`).
|
||||
|
||||
### 5.2 Copyable links
|
||||
|
||||
- Share links from the action shelf or context chips; both copy canonical URLs with all active filters.
|
||||
- CLI parity: inline callouts provide `stella` commands derived from the URL parameters to ensure console/CLI equivalence.
|
||||
- Offline note: links copied in sealed mode include the snapshot ID (`snapshot=<hash>`) so recipients know which offline data set to load.
|
||||
|
||||
### 5.3 Examples
|
||||
|
||||
- **`since` / `until`** use ISO-8601 timestamps (UTC). Preset ranges set only `since`; UI computes `until` on load.
|
||||
- **`severity`** accepts comma-separated policy buckets (e.g., `critical,high,kev`).
|
||||
- **`view`** stores module-specific state (e.g., `sbomView=usage`, `findingsPreset=threat-hunting`).
|
||||
- **`panel`** selects drawers or tabs (`panel=explain`, `panel=timeline`).
|
||||
- **`component`** encodes package selection using percent-encoded PURL syntax.
|
||||
- **`snapshot`** appears when copying links offline to reference Offline Kit build hash.
|
||||
@@
|
||||
| Use case | Example URL | Description |
|
||||
|----------|-------------|-------------|
|
||||
| Findings triage | `/console/findings?v=table&severity=critical,high&tenant=west-prod&since=2025-10-20T00:00:00Z` | Opens the findings table limited to critical/high for west-prod, last 7 days. |
|
||||
| SBOM component focus | `/console/sbom/sha256:abcd?tenant=west-prod&component=pkg:npm/react@18.3.0&view=usage` | Deep-links to a specific image digest and highlights an NPM package in Usage view. |
|
||||
| Advisory explain | `/console/advisories?tenant=west-prod&source=nvd&panel=detail&documentId=CVE-2025-1234` | Opens advisory list filtered to NVD and expands CVE detail drawer. |
|
||||
| Run monitor | `/console/runs/42?tenant=west-prod&panel=progress` | Focuses run ID 42 with progress drawer active (SSE stream attached). |
|
||||
|
||||
---
|
||||
|
||||
## 6. Tenant Switching Lifecycle
|
||||
|
||||
1. **Initiate:** User triggers `Ctrl/Cmd + T` or clicks the tenant badge. Switcher modal lists authorised tenants and recent selections.
|
||||
2. **Preview:** Selecting a tenant shows summary (environment, last snapshot, role coverage). The modal flags tenants missing required scopes for the current route.
|
||||
3. **Confirm:** On confirmation, the UI requests a new DPoP-bound access token from Authority (`aud=console`, `tenant=<id>`).
|
||||
4. **Invalidate caches:** Stores keyed by tenant purge automatically; modules emit `tenantChanged` events so in-flight SSE streams reconnect with new headers.
|
||||
5. **Restore state:** Global filters reapply where valid. Incompatible filters (for example, a saved view unavailable in the new tenant) prompt users to pick a fallback.
|
||||
6. **Audit and telemetry:** `ui.tenant.switch` log writes subject, from/to tenant, correlation ID. Metric `ui_tenant_switch_total` increments for observability dashboards.
|
||||
7. **Offline behaviour:** If the target tenant is absent from the offline snapshot, switcher displays guidance to import updated Offline Kit data before proceeding.
|
||||
|
||||
---
|
||||
|
||||
## 7. Breadcrumbs, Tabs, and Focus Management
|
||||
|
||||
- Breadcrumb titles update synchronously with route data loads. When fragments change (for example, selecting a finding), the breadcrumb text updates without pushing a new history entry to keep back/forward predictable.
|
||||
- Detail views rely on accessible tabs (`role="tablist"`) with keyboard support (`ArrowLeft/Right`). Tab selection updates the URL `tab` parameter for deep linking.
|
||||
- Focus management:
|
||||
- Route changes send focus to the primary heading (`h1`) using the live region announcer.
|
||||
- Opening drawers or modals traps focus until closed; ESC returns focus to the triggering element.
|
||||
- Keyboard-only navigation is validated via automated Playwright accessibility checks as part of `DEVOPS-CONSOLE-23-001`.
|
||||
|
||||
---
|
||||
|
||||
## 8. References
|
||||
|
||||
- `/docs/ui/console-overview.md` - structural overview, tenant model, global filters.
|
||||
- `/docs/ui/sbom-explorer.md` - SBOM-specific navigation and graphs (pending).
|
||||
- `/docs/ui/advisories-and-vex.md` - aggregation UX details (pending).
|
||||
- `/docs/ui/findings.md` - findings filters and explain drawer (pending).
|
||||
- `/docs/security/console-security.md` - Authority, scopes, CSP.
|
||||
- `/docs/cli-vs-ui-parity.md` - CLI equivalence matrix.
|
||||
- `/docs/accessibility.md` - keyboard remapping, WCAG validation checklists.
|
||||
|
||||
---
|
||||
|
||||
## 9. Compliance Checklist
|
||||
|
||||
- [ ] Route table matches Console build (paths, scopes, owners verified with Console Guild).
|
||||
- [ ] Keyboard shortcut matrix reflects implemented accelerators and accessibility overlay.
|
||||
- [ ] Deep-link examples tested for copy/share parity and CLI alignment.
|
||||
- [ ] Tenant switching flow documents cache invalidation and audit logging.
|
||||
- [ ] Filter tray, command palette, and presets cross-referenced with accessibility guidance.
|
||||
- [ ] Offline/air-gap notes included for palette, tenant switcher, and deep-link metadata.
|
||||
- [ ] Links to dependent docs (`/docs/ui/*`, `/docs/security/*`) validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
191
docs/ui/policies.md
Normal file
191
docs/ui/policies.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# StellaOps Console - Policies Workspace
|
||||
|
||||
> **Audience:** Policy Guild, Console UX, product ops, review leads.
|
||||
> **Scope:** Policy workspace navigation, editor surfaces, simulation, approvals, RBAC, observability, offline behaviour, and CLI parity for Sprint 23.
|
||||
|
||||
The Policies workspace centralises authoring, simulation, review, and promotion for `stella-dsl@1` packs. It builds on the Policy Editor (`docs/ui/policy-editor.md`) and adds list views, governance workflows, and integrations with runs and findings.
|
||||
|
||||
---
|
||||
|
||||
## 1. Access and prerequisites
|
||||
|
||||
- **Routes:**
|
||||
- `/console/policies` (list)
|
||||
- `/console/policies/:policyId` (details)
|
||||
- `/console/policies/:policyId/:revision` (editor, approvals, runs)
|
||||
- **Scopes:**
|
||||
- `policy:read` (list and details)
|
||||
- `policy:write` (edit drafts, run lint/compile)
|
||||
- `policy:submit`, `policy:review`, `policy:approve` (workflow actions)
|
||||
- `policy:runs` (view run history)
|
||||
- `policy:simulate` (run simulations)
|
||||
- `effective:write` (promotion visibility only; actual write remains server-side)
|
||||
- **Feature flags:** `policy.studio.enabled`, `policy.simulation.diff`, `policy.runCharts.enabled`, `policy.offline.bundleUpload`.
|
||||
- **Dependencies:** Policy Engine v2 APIs (`/policies`, `/policy/runs`, `/policy/simulations`), Policy Studio Monaco assets, Authority fresh-auth flows for critical operations.
|
||||
|
||||
---
|
||||
|
||||
## 2. List and detail views
|
||||
|
||||
### 2.1 Policy list
|
||||
|
||||
| Column | Description |
|
||||
|--------|-------------|
|
||||
| **Policy** | Human-readable name plus policy ID (e.g., `P-7`). |
|
||||
| **State** | `Active`, `Draft`, `Staged`, `Simulation`, `Archived`. Badge colours align with Policy Engine status. |
|
||||
| **Revision** | Latest revision digest (short SHA). |
|
||||
| **Owner** | Primary maintainer or owning team tag. |
|
||||
| **Last change** | Timestamp and actor of last update (edit, submit, approve). |
|
||||
| **Pending approvals** | Count of outstanding approval requests (with tooltip listing reviewers). |
|
||||
|
||||
Row actions: `Open`, `Duplicate`, `Export pack`, `Run simulation`, `Compare revisions`.
|
||||
|
||||
Filters: owning team, state, tag, pending approvals, contains staged changes, last change window, simulation warnings (determinism, failed run).
|
||||
|
||||
### 2.2 Policy detail header
|
||||
|
||||
- Summary cards: current state, digest, active revision, staged revision (if any), simulation status, last production run (timestamp, duration, determinism hash).
|
||||
- Action bar: `Edit draft`, `Run simulation`, `Submit for review`, `Promote`, `Export pack`, `View findings`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Editor shell
|
||||
|
||||
The editor view reuses the structure documented in `/docs/ui/policy-editor.md` and adds:
|
||||
|
||||
- **Context banner** showing tenant, policy ID, revision digest, and simulation badge if editing sandbox copy.
|
||||
- **Lint and compile status** displayed inline with time since last run.
|
||||
- **Checklist sidebar** summarising required steps (lint pass, simulation run, deterministic CI, security review). Each item links to evidence (e.g., latest simulation diff).
|
||||
- **Monaco integration** with policy-specific snippets, schema hover, code actions (`Insert allowlist`, `Add justification`).
|
||||
- **Draft autosave** every 30 seconds with conflict detection (merges disabled; last write wins with warning).
|
||||
|
||||
---
|
||||
|
||||
## 4. Simulation workflows
|
||||
|
||||
- Simulation modal accepts SBOM filter (golden set, specific SBOM IDs, tenant-wide) and options for VEX weighting overrides.
|
||||
- Simulations run asynchronously; progress shown in run ticker with status updates.
|
||||
- Diff view summarises totals: affected findings added/removed, severity up/down counts, quieted changes.
|
||||
- Side-by-side diff (Active vs Simulation) accessible directly from policy detail.
|
||||
- Export options: JSON diff, Markdown summary, CLI snippet `stella policy simulate --policy <id> --sbom <sbomId>`.
|
||||
- Simulation results cached per draft revision. Cache invalidates when draft changes or SBOM snapshot updates.
|
||||
- Simulation compliance card requires at least one up-to-date simulation before submission.
|
||||
|
||||
---
|
||||
|
||||
## 5. Review and approval
|
||||
|
||||
- **Review requests:** Authors tag reviewers; review sidebar lists pending reviewers, due dates, and escalation contact.
|
||||
- **Comments:** Threaded comments support markdown, mentions, and attachments (redacted before persistence). Comment resolution required before approval.
|
||||
- **Approval checklist:**
|
||||
- Lint/compile success
|
||||
- Simulation fresh (within configured SLA)
|
||||
- Determinism verification passed
|
||||
- Security review (if flagged)
|
||||
- Offline bundle prepared (optional)
|
||||
- **Fresh-auth:** Approve/promote buttons require fresh authentication; modal prompts for credentials and enforces short-lived token (<5 minutes).
|
||||
- **Approval audit:** Approval events recorded with correlation ID, digests, reviewer note, effective date, and optional ticket link.
|
||||
|
||||
---
|
||||
|
||||
## 6. Promotion and rollout
|
||||
|
||||
- Promotion dialog summarises staged changes, target tenants, release windows, and run plan (full vs incremental).
|
||||
- Operators can schedule promotion or apply immediately.
|
||||
- Promotion triggers Policy Engine to materialise new revision; console reflects status and shows run progress.
|
||||
- CLI parity: `stella policy promote --policy <id> --revision <rev> --run-mode full`.
|
||||
- Rollback guidance accessible from action bar (`Open rollback instructions`) linking to CLI command and documentation.
|
||||
|
||||
---
|
||||
|
||||
## 7. Runs and observability
|
||||
|
||||
- Runs tab displays table of recent runs with columns: run ID, type (`full`, `incremental`, `simulation`), duration, determinism hash, findings delta counts, triggered by.
|
||||
- Charts: findings trend, quieted findings trend, rule hit heatmap (top rules vs recent runs).
|
||||
- Clicking a run opens run detail drawer showing inputs (policy digest, SBOM batch hash, advisory snapshot hash), output summary, and explain bundle download.
|
||||
- Error runs display red badge; detail drawer includes correlation ID and link to Policy Engine logs.
|
||||
- SSE updates stream run status changes to keep UI real-time.
|
||||
|
||||
---
|
||||
|
||||
## 8. RBAC and governance
|
||||
|
||||
| Role | Scopes | Capabilities |
|
||||
|------|--------|--------------|
|
||||
| **Author** | `policy:read`, `policy:write`, `policy:simulate` | Create drafts, run lint/simulations, comment. |
|
||||
| **Reviewer** | `policy:read`, `policy:review`, `policy:simulate` | Leave review comments, request changes. |
|
||||
| **Approver** | `policy:read`, `policy:approve`, `policy:runs`, `policy:simulate` | Approve/promote, trigger runs, view run history. |
|
||||
| **Operator** | `policy:read`, `policy:runs`, `policy:simulate`, `effective:write` | Schedule promotions, monitor runs (no editing). |
|
||||
| **Admin** | Above plus Authority admin scopes | Manage roles, configure escalation chains. |
|
||||
|
||||
UI disables controls not allowed by current scope and surfaces tooltip with required scope names. Audit log captures denied attempts (`policy.ui.action_denied`).
|
||||
|
||||
---
|
||||
|
||||
## 9. Exports and offline bundles
|
||||
|
||||
- `Export pack` button downloads policy pack (zip) with metadata, digest manifest, and README.
|
||||
- Offline bundle uploader allows importing reviewed packs; UI verifies signatures and digests before applying.
|
||||
- Explain bundle export collects latest run explain traces for audit.
|
||||
- CLI parity:
|
||||
- `stella policy export --policy <id> --revision <rev>`
|
||||
- `stella policy bundle import --file <bundle>`
|
||||
- `stella policy bundle export --policy <id> --revision <rev>`
|
||||
- Offline mode displays banner and disables direct promotion; provides script instructions for offline runner.
|
||||
|
||||
---
|
||||
|
||||
## 10. Observability and alerts
|
||||
|
||||
- Metrics cards show `policy_run_seconds`, `policy_rules_fired_total`, `policy_determinism_failures_total`.
|
||||
- Alert banners surfaced for determinism failures, simulation stale warnings, approval SLA breaches.
|
||||
- Links to dashboards (Grafana) pre-filtered with policy ID.
|
||||
- Telemetry panel lists last emitted events (policy.promoted, policy.simulation.completed).
|
||||
|
||||
---
|
||||
|
||||
## 11. Offline and air-gap considerations
|
||||
|
||||
- In sealed mode, editor warns about cached enrichment data; simulation run button adds tooltip explaining degraded evidence.
|
||||
- Promotions queue and require manual CLI execution on authorised host; UI provides downloadable job manifest.
|
||||
- Run charts switch to snapshot data; SSE streams disabled, replaced by manual refresh button.
|
||||
- Export/download buttons label file paths for removable media transfer.
|
||||
|
||||
---
|
||||
|
||||
## 12. Screenshot coordination
|
||||
|
||||
- Placeholders:
|
||||
- ``
|
||||
- ``
|
||||
- ``
|
||||
- Coordinate with Console Guild via `#console-screenshots` (entry 2025-10-26) to replace placeholders once UI captures are ready (light and dark themes).
|
||||
|
||||
---
|
||||
|
||||
## 13. References
|
||||
|
||||
- `/docs/ui/policy-editor.md` - detailed editor mechanics.
|
||||
- `/docs/ui/findings.md` - downstream findings view and explain drawer.
|
||||
- `/docs/policy/overview.md` and `/docs/policy/runs.md` - Policy Engine contracts.
|
||||
- `/docs/security/authority-scopes.md` - scope definitions.
|
||||
- `/docs/cli/policy.md` - CLI commands for policy management.
|
||||
- `/docs/ui/console-overview.md` - navigation shell and filters.
|
||||
|
||||
---
|
||||
|
||||
## 14. Compliance checklist
|
||||
|
||||
- [ ] Policy list and detail workflow documented (columns, filters, actions).
|
||||
- [ ] Editor shell extends Policy Studio guidance with checklists and lint/simulation integration.
|
||||
- [ ] Simulation flow, diff presentation, and CLI parity captured.
|
||||
- [ ] Review, approval, and promotion workflows detailed with scope gating.
|
||||
- [ ] Runs dashboard, metrics, and SSE behaviour described.
|
||||
- [ ] Exports and offline bundle handling included.
|
||||
- [ ] Offline/air-gap behaviour and screenshot coordination recorded.
|
||||
- [ ] References validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
|
||||
178
docs/ui/policy-editor.md
Normal file
178
docs/ui/policy-editor.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# Policy Editor Workspace
|
||||
|
||||
> **Audience:** Product/UX, UI engineers, policy authors/reviewers using the Console.
|
||||
> **Scope:** Layout, features, RBAC, a11y, simulation workflow, approvals, run dashboards, and offline considerations for the Policy Engine v2 editor (“Policy Studio”).
|
||||
|
||||
The Policy Editor is the primary Console workspace for composing, simulating, and approving `stella-dsl@1` policies. It combines Monaco-based editing, diff visualisations, and governance tools so authors and reviewers can collaborate without leaving the browser.
|
||||
|
||||
---
|
||||
|
||||
## 1 · Access & Prerequisites
|
||||
|
||||
- **Routes:** `/console/policy` (list) → `/console/policy/:policyId/:version?`.
|
||||
- **Scopes:**
|
||||
- `policy:write` to edit drafts, run lint/compile, attach simulations.
|
||||
- `policy:submit` / `policy:review` / `policy:approve` for workflow actions.
|
||||
- `policy:run` to trigger runs, `policy:runs` to inspect history.
|
||||
- `findings:read` to open explain drawers.
|
||||
- **Feature flags:** `policyStudio.enabled` (defaults true once Policy Engine v2 API available).
|
||||
- **Browser support:** Evergreen Chrome, Edge, Firefox, Safari (last two versions). Uses WASM OPA sandbox; ensure COOP/COEP enabled per [UI architecture](../ARCHITECTURE_UI.md).
|
||||
|
||||
---
|
||||
|
||||
## 2 · Workspace Layout
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Header: Policy selector • tenant switch • last activation banner │
|
||||
├────────────────────────────────────────────────────────────────────────────┤
|
||||
│ Sidebar (left) │ Main content (right) │
|
||||
│ - Revision list │ ┌───────────── Editor tabs ───────────────┐ │
|
||||
│ - Checklist status │ │ DSL │ Simulation │ Approvals │ ... │ │
|
||||
│ - Pending reviews │ └─────────────────────────────────────────┘ │
|
||||
│ - Run backlog │ │
|
||||
│ │ Editor pane / Simulation diff / Run viewer │
|
||||
└────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
- **Sidebar:** Revision timeline (draft, submitted, approved), compliance checklist cards, outstanding review requests, run backlog (incremental queue depth and SLA).
|
||||
- **Editor tabs:**
|
||||
- *DSL* (primary Monaco editor)
|
||||
- *Simulation* (pre/post diff charts)
|
||||
- *Approvals* (comments, audit log)
|
||||
- *Runs* (heatmap dashboards)
|
||||
- *Explain Explorer* (optional drawer for findings)
|
||||
- **Right rail:** context cards for VEX providers, policy metadata, quick links to CLI/API docs.
|
||||
|
||||
> Placeholder screenshot: `` (add after UI team captures latest build).
|
||||
|
||||
---
|
||||
|
||||
## 3 · Editing Experience
|
||||
|
||||
- Monaco editor configured for `stella-dsl@1`:
|
||||
- Syntax highlighting, IntelliSense for rule/action names, snippets for common patterns.
|
||||
- Inline diagnostics sourced from `/policies/{id}/lint` and `/compile`.
|
||||
- Code actions (“Fix indentation”, “Insert requireVex block”).
|
||||
- Mini-map disabled by default to reduce contrast noise; toggle available.
|
||||
- **Keyboard shortcuts (accessible via `?`):**
|
||||
- `Ctrl/Cmd + S` – Save draft (uploads to API if changed).
|
||||
- `Ctrl/Cmd + Shift + Enter` – Run lint + compile.
|
||||
- `Ctrl/Cmd + Shift + D` – Open diff view vs baseline.
|
||||
- `Alt + Shift + F` – Format document (canonical ordering).
|
||||
- **Schema tooltips:** Hover on `profile`, `rule`, `action` to view documentation (sourced from DSL doc).
|
||||
- **Offline warnings:** When `sealed` mode detected, banner reminds authors to validate with offline bundle.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Simulation & Diff Panel
|
||||
|
||||
- Triggered via “Run simulation” (toolbar) or automatically after compile.
|
||||
- Displays:
|
||||
- **Summary cards:** total findings added/removed/unchanged; severity up/down counts.
|
||||
- **Rule hit table:** top rules contributing to diffs with percentage change.
|
||||
- **Component list:** virtualised table linking to explain drawer; supports filters (severity, status, VEX outcome).
|
||||
- **Visualisations:** stacked bar chart (severity deltas), sparkline for incremental backlog impact.
|
||||
- Supports run presets:
|
||||
- `Golden SBOM set` (default)
|
||||
- Custom SBOM selection (via multi-select and search)
|
||||
- Import sample JSON from CLI (`Upload diff`).
|
||||
- Diff export options:
|
||||
- `Download JSON` (same schema as CLI output)
|
||||
- `Copy as Markdown` for review comments
|
||||
- Simulation results persist per draft version; history accessible via timeline.
|
||||
|
||||
---
|
||||
|
||||
## 5 · Review & Approval Workflow
|
||||
|
||||
- **Commenting:** Line-level comments anchored to DSL lines; global comments supported. Uses rich text (Markdown subset) with mention support (`@group/sec-reviewers`).
|
||||
- **Resolution:** Approvers/reviewers can mark comment resolved; history preserved in timeline.
|
||||
- **Approval pane:**
|
||||
- Checklist (lint, simulation, determinism CI) with status indicators; links to evidence.
|
||||
- Reviewer checklist (quorum, blocking comments).
|
||||
- Approval button only enabled when checklist satisfied.
|
||||
- **Audit log:** Chronological view of submit/review/approve/archive events with actor, timestamp, note, attachments.
|
||||
- **RBAC feedback:** When user lacks permission, actions are disabled with tooltip referencing required scope(s).
|
||||
- **Notifications:** Integration with Notifier—subscribe/unsubscribe from review reminders within panel.
|
||||
|
||||
---
|
||||
|
||||
## 6 · Runs & Observability
|
||||
|
||||
- **Run tab** consumes `/policy/runs` data:
|
||||
- Heatmap of rule hits per run (rows = runs, columns = top rules).
|
||||
- VEX override counter, suppressions, quieted findings metrics.
|
||||
- Incremental backlog widget (queue depth vs SLA).
|
||||
- Export CSV/JSON button.
|
||||
- **Replay/Download:** For each run, actions to download sealed replay bundle or open CLI command snippet.
|
||||
- **Alert banners:**
|
||||
- Determinism mismatch (red)
|
||||
- SLA breach (amber)
|
||||
- Pending replay (info)
|
||||
|
||||
---
|
||||
|
||||
## 7 · Explain & Findings Integration
|
||||
|
||||
- Inline “Open in Findings” button for any diff entry; opens side drawer with explain trace (same schema as `/findings/*/explain`).
|
||||
- Drawer includes:
|
||||
- Rule sequence with badges (block/warn/quiet).
|
||||
- VEX evidence and justification codes.
|
||||
- Links to advisories (Concelier) and SBOM components.
|
||||
- Copy-to-clipboard (JSON) and “Share permalink” features (permalinks encode tenant, policy version, component).
|
||||
|
||||
---
|
||||
|
||||
## 8 · Accessibility & i18n
|
||||
|
||||
- WCAG 2.2 AA:
|
||||
- Focus order follows logical workflow; skip link available.
|
||||
- All actionable icons paired with text or `aria-label`.
|
||||
- Simulation charts include table equivalents for screen readers.
|
||||
- Keyboard support:
|
||||
- `Alt+1/2/3/4` to switch tabs.
|
||||
- `Shift+?` toggles help overlay (with key map).
|
||||
- Internationalisation:
|
||||
- Translations sourced from `/locales/{lang}.json`.
|
||||
- Date/time displayed using user locale via `Intl.DateTimeFormat`.
|
||||
- Theming:
|
||||
- Light/dark CSS tokens; Monaco theme syncs with overall theme.
|
||||
- High-contrast mode toggled via user preferences.
|
||||
|
||||
---
|
||||
|
||||
## 9 · Offline & Air-Gap Behaviour
|
||||
|
||||
- When console operates in sealed enclave:
|
||||
- Editor displays “Sealed mode” banner with import timestamp.
|
||||
- Simulation uses cached SBOM/advisory/VEX data only; results flagged accordingly.
|
||||
- “Export bundle” button packages draft + simulations for transfer.
|
||||
- Approvals require local Authority; UI blocks actions if `policy:approve` scope absent due to offline token limitations.
|
||||
- Run tab surfaces bundle staleness warnings (`policy_runs.inputs.env.sealed=true`).
|
||||
|
||||
---
|
||||
|
||||
## 10 · Telemetry & Testing Hooks
|
||||
|
||||
- User actions (simulate, submit, approve, activate) emit telemetry (`ui.policy.action` spans) with anonymised metadata.
|
||||
- Console surfaces correlation IDs for lint/compile errors to ease support triage.
|
||||
- Cypress/Playwright fixtures available under `ui/policy-editor/examples/`; docs should note to re-record after significant UI changes.
|
||||
|
||||
---
|
||||
|
||||
## 11 · Compliance Checklist
|
||||
|
||||
- [ ] **Lint integration:** Editor surfaces diagnostics from API compile endpoint; errors link to DSL documentation.
|
||||
- [ ] **Simulation parity:** Diff panel mirrors CLI schema; export button tested.
|
||||
- [ ] **Workflow RBAC:** Buttons enable/disable correctly per scope (`policy:write/submit/review/approve`).
|
||||
- [ ] **A11y verified:** Keyboard navigation, focus management, colour contrast (light/dark) pass automated Axe checks.
|
||||
- [ ] **Offline safeguards:** Sealed-mode banner and bundle export flows present; no network calls trigger in sealed mode.
|
||||
- [ ] **Telemetry wired:** Action spans and error logs include policyId, version, traceId.
|
||||
- [ ] **Docs cross-links:** Links to DSL, lifecycle, runs, API, CLI guides validated.
|
||||
- [ ] **Screenshot placeholders updated:** Replace TODO images with latest UI captures before GA.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 20).*
|
||||
|
||||
169
docs/ui/runs.md
Normal file
169
docs/ui/runs.md
Normal file
@@ -0,0 +1,169 @@
|
||||
# StellaOps Console - Runs Workspace
|
||||
|
||||
> **Audience:** Scheduler Guild, Console UX, operators, support engineers.
|
||||
> **Scope:** Runs dashboard, live progress, queue management, diffs, retries, evidence downloads, observability, troubleshooting, and offline behaviour (Sprint 23).
|
||||
|
||||
The Runs workspace surfaces Scheduler activity across tenants: upcoming schedules, active runs, progress, deltas, and evidence bundles. It helps operators monitor backlog, drill into run segments, and recover from failures without leaving the console.
|
||||
|
||||
---
|
||||
|
||||
## 1. Access and prerequisites
|
||||
|
||||
- **Route:** `/console/runs` (list) with detail drawer `/console/runs/:runId`. SSE stream at `/console/runs/:runId/stream`.
|
||||
- **Scopes:** `runs.read` (baseline), `runs.manage` (cancel/retry), `policy:runs` (view policy deltas), `downloads.read` (evidence bundles).
|
||||
- **Dependencies:** Scheduler WebService (`/runs`, `/schedules`, `/preview`), Scheduler Worker event feeds, Policy Engine run summaries, Scanner WebService evidence endpoints.
|
||||
- **Feature flags:** `runs.dashboard.enabled`, `runs.sse.enabled`, `runs.retry.enabled`, `runs.evidenceBundles`.
|
||||
- **Tenancy:** Tenant selector filters list; cross-tenant admins can pin multiple tenants side-by-side (split view).
|
||||
|
||||
---
|
||||
|
||||
## 2. Layout overview
|
||||
|
||||
```
|
||||
+-------------------------------------------------------------------+
|
||||
| Header: Tenant badge - schedule selector - backlog metrics |
|
||||
+-------------------------------------------------------------------+
|
||||
| Cards: Active runs - Queue depth - New findings - KEV deltas |
|
||||
+-------------------------------------------------------------------+
|
||||
| Tabs: Active | Completed | Scheduled | Failures |
|
||||
+-------------------------------------------------------------------+
|
||||
| Runs table (virtualised) |
|
||||
| Columns: Run ID | Trigger | State | Progress | Duration | Deltas |
|
||||
+-------------------------------------------------------------------+
|
||||
| Detail drawer: Summary | Segments | Deltas | Evidence | Logs |
|
||||
+-------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
The header integrates the status ticker to show ingestion deltas and planner heartbeat.
|
||||
|
||||
---
|
||||
|
||||
## 3. Runs table
|
||||
|
||||
| Column | Description |
|
||||
|--------|-------------|
|
||||
| **Run ID** | Deterministic identifier (`run:<tenant>:<timestamp>:<nonce>`). Clicking opens detail drawer. |
|
||||
| **Trigger** | `cron`, `manual`, `feedser`, `vexer`, `policy`, `content-refresh`. Tooltip lists schedule and initiator. |
|
||||
| **State** | Badges: `planning`, `queued`, `running`, `completed`, `cancelled`, `error`. Errors include error code (e.g., `ERR_RUN_005`). |
|
||||
| **Progress** | Percentage + processed/total candidates. SSE updates increment in real time. |
|
||||
| **Duration** | Elapsed time (auto-updating). Completed runs show total duration; running runs show timer. |
|
||||
| **Deltas** | Count of findings deltas (`+critical`, `+high`, `-quieted`, etc.). Tooltip expands severity breakdown. |
|
||||
|
||||
Row badges include `KEV first`, `Content refresh`, `Policy promotion follow-up`, and `Retry`. Selecting multiple rows enables bulk downloads and exports.
|
||||
|
||||
Filters: trigger type, state, schedule, severity impact (critical/high), policy revision, timeframe, planner shard, error code.
|
||||
|
||||
---
|
||||
|
||||
## 4. Detail drawer
|
||||
|
||||
Sections:
|
||||
|
||||
1. **Summary** - run metadata (tenant, trigger, linked schedule, planner shard count, started/finished timestamps, correlation ID).
|
||||
2. **Progress** - segmented progress bar (planner, queue, execution, post-processing). Real-time updates via SSE; includes throughput (targets per minute).
|
||||
3. **Segments** - table of run segments with state, target count, executor, retry count. Operators can retry failed segments individually (requires `runs.manage`).
|
||||
4. **Deltas** - summary of findings changes (new findings, resolved findings, severity shifts, KEV additions). Links to Findings view filtered by run ID.
|
||||
5. **Evidence** - links to evidence bundles (JSON manifest, DSSE attestation), policy run records, and explain bundles. Download buttons use `/console/exports` orchestration.
|
||||
6. **Logs** - last 50 structured log entries with severity, message, correlation ID; scroll-to-live for streaming logs. `Open in logs` copies query for external log tooling.
|
||||
|
||||
---
|
||||
|
||||
## 5. Queue and schedule management
|
||||
|
||||
- Schedule side panel lists upcoming jobs with cron expressions, time zones, and enable toggles.
|
||||
- Queue depth chart shows current backlog per tenant and schedule (planner backlog, executor backlog).
|
||||
- "Preview impact" button opens modal for manual run planning (purls or vuln IDs) and shows impacted image count before launch. CLI parity: `stella runs preview --tenant <id> --file keys.json`.
|
||||
- Manual run form allows selecting mode (`analysis-only`, `content-refresh`), scope, and optional policy snapshot.
|
||||
- Pausing a schedule requires confirmation; UI displays earliest next run after resume.
|
||||
|
||||
---
|
||||
|
||||
## 6. Live updates and SSE stream
|
||||
|
||||
- SSE endpoint `/console/runs/{id}/stream` streams JSON events (`stateChanged`, `segmentProgress`, `deltaSummary`, `log`). UI reconnects with exponential backoff and heartbeat.
|
||||
- Global ticker shows planner heartbeat age; banner warns after 90 seconds of silence.
|
||||
- Offline mode disables SSE and falls back to polling every 30 seconds.
|
||||
|
||||
---
|
||||
|
||||
## 7. Retry and remediation
|
||||
|
||||
- Failed segments show retry button; UI displays reason and cooldown timers. Retry actions are scope-gated and logged.
|
||||
- Full run retry resets segments while preserving original run metadata; new run ID references previous run in `retryOf` field.
|
||||
- "Escalate to support" button opens incident template pre-filled with run context and correlation IDs.
|
||||
- Troubleshooting quick links:
|
||||
- `ERR_RUN_001` (planner lock)
|
||||
- `ERR_RUN_005` (Scanner timeout)
|
||||
- `ERR_RUN_009` (impact index stale)
|
||||
Each link points to corresponding runbook sections (`docs/ops/scheduler-runbook.md`).
|
||||
- CLI parity: `stella runs retry --run <id>`, `stella runs cancel --run <id>`.
|
||||
|
||||
---
|
||||
|
||||
## 8. Evidence downloads
|
||||
|
||||
- Evidence tab aggregates:
|
||||
- Policy run summary (`/policy/runs/{id}`)
|
||||
- Findings delta CSV (`/downloads/findings/{runId}.csv`)
|
||||
- Scanner evidence bundle (compressed JSON with manifest)
|
||||
- Downloads show size, hash, signature status.
|
||||
- "Bundle for offline" packages all evidence into single tarball with manifest/digest; UI notes CLI parity (`stella runs export --run <id> --bundle`).
|
||||
- Completed bundles stored in Downloads workspace for reuse (links provided).
|
||||
|
||||
---
|
||||
|
||||
## 9. Observability
|
||||
|
||||
- Metrics cards: `scheduler_queue_depth`, `scheduler_runs_active`, `scheduler_runs_error_total`, `scheduler_runs_duration_seconds`.
|
||||
- Trend charts: queue depth (last 24h), runs per trigger, average duration, determinism score.
|
||||
- Alert banners: planner lag > SLA, queue depth > threshold, repeated error codes.
|
||||
- Telemetry panel lists latest events (e.g., `scheduler.run.started`, `scheduler.run.completed`, `scheduler.run.failed`).
|
||||
|
||||
---
|
||||
|
||||
## 10. Offline and air-gap behaviour
|
||||
|
||||
- Offline banner highlights snapshot timestamp and indicates SSE disabled.
|
||||
- Manual run form switches to generate CLI script for offline execution (`stella runs submit --bundle <file>`).
|
||||
- Evidence download buttons output local paths; UI reminds to copy to removable media.
|
||||
- Queue charts use snapshot data; manual refresh button loads latest records from Offline Kit.
|
||||
- Tenants absent from snapshot hidden to avoid partial data.
|
||||
|
||||
---
|
||||
|
||||
## 11. Screenshot coordination
|
||||
|
||||
- Placeholders:
|
||||
- ``
|
||||
- ``
|
||||
- Coordinate with Scheduler Guild for updated screenshots after Sprint 23 UI stabilises (tracked in `#console-screenshots`, entry 2025-10-26).
|
||||
|
||||
---
|
||||
|
||||
## 12. References
|
||||
|
||||
- `/docs/ui/console-overview.md` - shell, SSE ticker.
|
||||
- `/docs/ui/navigation.md` - route map and deep links.
|
||||
- `/docs/ui/findings.md` - findings filtered by run.
|
||||
- `/docs/ui/downloads.md` - download manager, export retention, CLI parity.
|
||||
- `/docs/ARCHITECTURE_SCHEDULER.md` - scheduler architecture and data model.
|
||||
- `/docs/policy/runs.md` - policy run integration.
|
||||
- `/docs/cli/policy.md` and `/docs/cli/policy.md` section 5 for CLI parity (runs commands pending).
|
||||
- `/docs/ops/scheduler-runbook.md` - troubleshooting.
|
||||
|
||||
---
|
||||
|
||||
## 13. Compliance checklist
|
||||
|
||||
- [ ] Runs table columns, filters, and states described.
|
||||
- [ ] Detail drawer sections documented (segments, deltas, evidence, logs).
|
||||
- [ ] Queue management, manual run, and preview coverage included.
|
||||
- [ ] SSE and live update behaviour detailed.
|
||||
- [ ] Retry, remediation, and runbook references provided.
|
||||
- [ ] Evidence downloads and bundle workflows documented with CLI parity.
|
||||
- [ ] Offline behaviour and screenshot coordination recorded.
|
||||
- [ ] References validated.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
195
docs/ui/sbom-explorer.md
Normal file
195
docs/ui/sbom-explorer.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# StellaOps Console - SBOM Explorer
|
||||
|
||||
> **Audience:** Console UX, SBOM Service Guild, enablement teams, customer onboarding.
|
||||
> **Scope:** Catalog listing, component detail, graph overlays, exports, performance hints, and offline behaviour for the SBOM Explorer that ships in Sprint 23.
|
||||
|
||||
The SBOM Explorer lets operators inspect software bills of materials collected by Scanner and normalised by the SBOM Service. It provides tenant-scoped catalogs, usage overlays, provenance-aware graphs, and deterministic export paths that align with CLI workflows.
|
||||
|
||||
---
|
||||
|
||||
## 1. Access and prerequisites
|
||||
|
||||
- **Routes:** `/console/sbom` (catalog) and `/console/sbom/:digest` (detail).
|
||||
- **Scopes:** `sbom.read` (required), `sbom.export` for large export jobs, `findings.read` to open explain drawers, `policy.read` to view overlay metadata.
|
||||
- **Feature flags:** `sbomExplorer.enabled` (default true when SBOM Service v3 API is enabled) and `graph.overlays.enabled` for Cartographer-backed overlays.
|
||||
- **Tenant scoping:** All queries include `tenant` tokens; switching tenants triggers catalog refetch and clears cached overlays.
|
||||
- **Data dependencies:** Requires SBOM Service 3.1+ with Cartographer overlays and Policy Engine explain hints enabled.
|
||||
|
||||
---
|
||||
|
||||
## 2. Layout overview
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------+
|
||||
| Header: Tenant badge - global filters - offline indicator - actions |
|
||||
+-----------------------------------------------------------------------+
|
||||
| Left rail: Saved views - pinned tags - export queue status |
|
||||
+-----------------------------------------------------------------------+
|
||||
| Catalog table (virtualised) |
|
||||
| - Columns: Image digest - Source - Scan timestamp - Policy verdict |
|
||||
| - Badges: Delta SBOM, Attested, Offline snapshot |
|
||||
+-----------------------------------------------------------------------+
|
||||
| Detail drawer or full page tabs (Inventory | Usage | Components | |
|
||||
| Overlays | Explain | Exports) |
|
||||
+-----------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
The catalog and detail views reuse the shared command palette, context chips, and SSE status ticker described in `/docs/ui/navigation.md`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Catalog view
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| **Virtual table** | Uses Angular CDK virtual scroll to render up to 10,000 records per tenant without layout jank. Sorting and filtering are client-side for <= 20k rows; the UI upgrades to server-side queries automatically when more records exist. |
|
||||
| **Preset segments** | Quick toggles for `All`, `Recent (7 d)`, `Delta-ready`, `Attested`, and `Offline snapshots`. Each preset maps to saved view tokens for CLI parity. |
|
||||
| **Search** | Global search field supports image digests, repository tags, SBOM IDs, and component PURLs. Search terms propagate to the detail view when opened. |
|
||||
| **Badges** | - `Delta` badge indicates SBOM produced via delta mode (layers reuse).<br>- `Attested` badge links to Attestor proof and Rekor record.<br>- `Snapshot` badge shows offline import hash.<br>- `Policy` badge references last policy verdict summary. |
|
||||
| **Bulk actions** | Multi-select rows to stage export jobs, trigger async explain generation, or copy CLI commands. Actions enforce per-tenant rate limits and show authority scopes in tooltips. |
|
||||
|
||||
---
|
||||
|
||||
## 4. Detail tabs
|
||||
|
||||
### 4.1 Inventory tab
|
||||
|
||||
- Default view summarising all components with columns for package name (PURL), version, supplier, license, size, and counts of referencing layers.
|
||||
- Filters: severity, ecosystem (OS, NPM, PyPI, Maven, Go, NuGet, Rust, containers), usage flag (true/false), package tags.
|
||||
- Sorting: by severity (desc), version (asc), supplier.
|
||||
- Cell tooltips reference Concelier advisories and Policy Engine findings when available.
|
||||
- Total component count, unique suppliers, and critical severity counts appear in the header cards.
|
||||
|
||||
### 4.2 Usage tab
|
||||
|
||||
- Focuses on runtime usage (EntryTrace, runtime sensors, allow lists).
|
||||
- Columns include process names, entry points, and `usedByEntrypoint` flags.
|
||||
- Grouping: by entry point, by package, or by namespace (Kubernetes).
|
||||
- Highlights mismatches between declared dependencies and observed usage for drift detection.
|
||||
|
||||
### 4.3 Components tab
|
||||
|
||||
- Deep dive for a single component selected from Inventory or Usage.
|
||||
- Shows provenance timeline (introduced in layer, modified, removed), file paths, cryptographic hashes, and linked evidence (DSSE, Attestor bundles).
|
||||
- Links to CLI commands: `stella sbom component show <digest> <purl>` and `stella sbom component export`.
|
||||
- Drawer supports multi-component comparison through tabbed interface.
|
||||
|
||||
### 4.4 Overlays tab
|
||||
|
||||
- Displays Cartographer overlays: vulnerability overlays (policy verdicts), runtime overlays (process traces), and vendor advisories.
|
||||
- Each overlay card lists source, generation timestamp, precedence, and staleness relative to tenant SLA.
|
||||
- Toggle overlays on/off to see impact on component status; UI does not mutate canonical SBOM, it only enriches the view.
|
||||
- Graph preview button opens force-directed component graph (limited to <= 500 nodes) with filters for dependency depth and relationship type.
|
||||
- Overlay metadata includes the CLI parity snippet: `stella sbom overlay apply --overlay <id> --digest <digest>`.
|
||||
|
||||
### 4.5 Explain tab
|
||||
|
||||
- Integrates Policy Engine explain drawer.
|
||||
- Shows rule hits, VEX overrides, and evidence per component.
|
||||
- Provides "Open in Findings" link that preserves tenant and filters.
|
||||
|
||||
### 4.6 Exports tab
|
||||
|
||||
- Lists available exports (CycloneDX JSON, CycloneDX Protobuf, SPDX JSON, SPDX Tag-Value, Delta bundle, Evidence bundle).
|
||||
- Each export entry shows size, hash (SHA-256), format version, and generation time.
|
||||
- Download buttons respect RBAC and offline quotas; CLI callouts mirror `stella sbom export`.
|
||||
- "Schedule export" launches async job for large bundles; job status integrates with `/console/downloads`.
|
||||
- Includes copy-to-clipboard path for offline transfers (`/offline-kits/export/<tenant>/<digest>/<format>`).
|
||||
|
||||
---
|
||||
|
||||
## 5. Filters and presets
|
||||
|
||||
| Filter | Applies to | Notes |
|
||||
|--------|------------|-------|
|
||||
| **Severity** | Inventory, Overlays, Explain | Uses Policy Engine severity buckets and KEV flag. |
|
||||
| **Ecosystem** | Inventory, Usage | Multi-select list with search; maps to package type derived from PURL. |
|
||||
| **License** | Inventory | Groups by SPDX identifiers; warns on copyleft obligations. |
|
||||
| **Supplier** | Inventory, Components | Autocomplete backed by SBOM metadata. |
|
||||
| **Tags** | Inventory, Usage | Tags provided by Scanner or user-defined metadata. |
|
||||
| **Component search** | Components, Overlays | Accepts PURL or substring; retains highlight when switching tabs. |
|
||||
| **Snapshot** | Catalog | Filters to SBOMs sourced from Offline Kit or local import. |
|
||||
| **Attested only** | Catalog, Exports | Limits to SBOMs signed by Attestor; displays Rekor badge. |
|
||||
|
||||
Saved views store combinations of these filters and expose command palette shortcuts (`Cmd+1-9 / Ctrl+1-9`).
|
||||
|
||||
---
|
||||
|
||||
## 6. Graph overlays and cartography
|
||||
|
||||
- Graph view is powered by Cartographer projections (tenant-scoped graph snapshots).
|
||||
- Supported overlays:
|
||||
- **Dependency graph** (default) - nodes represent components, edges represent dependencies with direction (introducer -> introduced).
|
||||
- **Runtime call graph** - optional overlay layering process calls on top of dependencies.
|
||||
- **Vulnerability overlay** - colours nodes by highest severity and outlines exploited components.
|
||||
- Controls: depth slider (1-6), include transitive flag, hide dev dependencies toggle, highlight vendor-specified critical paths.
|
||||
- Export options: GraphML, JSON Lines, and screenshot capture (requires `graph.export`).
|
||||
- Performance guardrails: overlays warn when node count exceeds 2,000; user can queue background job to render static graph for download instead.
|
||||
|
||||
---
|
||||
|
||||
## 7. Exports and automation
|
||||
|
||||
- **Instant exports:** Inline downloads for CycloneDX JSON/Protobuf (<= 25 MB) and SPDX JSON (<= 25 MB).
|
||||
- **Async exports:** Larger bundles stream through the download manager with resume support. UI polls `/console/downloads` every 15 seconds while export is in progress.
|
||||
- **CLI parity:** Each export card displays the equivalent CLI command and environment variables (proxy, offline).
|
||||
- **Compliance metadata:** Export manifests include SBOM ID, component count, hash, signature state, and policy verdict summary so auditors can validate offline.
|
||||
- **Automation hooks:** Webhook button copies the `/downloads/hooks/subscribe` call for integration with CI pipelines.
|
||||
|
||||
---
|
||||
|
||||
## 8. Performance tips
|
||||
|
||||
- Virtual scroll keeps initial render under 70 ms for 10k rows; server-side pagination engages beyond that threshold.
|
||||
- Graph overlay rendering uses Web Workers to keep main thread responsive; heavy layouts show "Background layout in progress" banner.
|
||||
- SSE updates (new SBOM ready) refresh header cards and prepend rows without full table redraw.
|
||||
- Prefetching: opening a detail drawer preloads overlays and exports concurrently; these requests cancel automatically if the user navigates away.
|
||||
- Local cache (IndexedDB) stores last viewed SBOM detail for each tenant (up to 20 entries). Cache invalidates when new merge hash is observed.
|
||||
|
||||
---
|
||||
|
||||
## 9. Offline and air-gap behaviour
|
||||
|
||||
- Catalog reads from Offline Kit snapshot if gateway is in sealed mode; offline banner lists snapshot ID and staleness.
|
||||
- Overlays limited to data included in snapshot; missing overlays show guidance to import updated Cartographer package.
|
||||
- Exports queue locally and generate tarballs ready to copy to removable media.
|
||||
- CLI parity callouts switch to offline examples (using `stella sbom export --offline`).
|
||||
- Tenants unavailable in snapshot are hidden from the tenant picker to prevent inconsistent views.
|
||||
|
||||
---
|
||||
|
||||
## 10. Screenshot coordination
|
||||
|
||||
- Placeholder images:
|
||||
- ``
|
||||
- ``
|
||||
- Coordinate with Console Guild to capture updated screenshots (dark and light theme) once Sprint 23 UI stabilises. Track follow-up in Console Guild thread `#console-screenshots` dated 2025-10-26.
|
||||
|
||||
---
|
||||
|
||||
## 11. References
|
||||
|
||||
- `/docs/ui/console-overview.md` - navigation shell, tenant model, filters.
|
||||
- `/docs/ui/navigation.md` - command palette, deep-link schema.
|
||||
- `/docs/ui/downloads.md` - download queue, manifest parity, offline export handling.
|
||||
- `/docs/security/console-security.md` - scopes, DPoP, CSP.
|
||||
- `/docs/cli-vs-ui-parity.md` - CLI equivalence matrix.
|
||||
- `/docs/architecture/console.md` (pending) - component data flows.
|
||||
- `/docs/architecture/overview.md` - high-level module relationships.
|
||||
- `/docs/ingestion/aggregation-only-contract.md` - provenance and guard rails.
|
||||
|
||||
---
|
||||
|
||||
## 12. Compliance checklist
|
||||
|
||||
- [ ] Catalog table and detail tabs documented with columns, filters, and presets.
|
||||
- [ ] Overlay behaviour describes Cartographer integration and CLI parity.
|
||||
- [ ] Export section includes instant vs async workflow and compliance metadata.
|
||||
- [ ] Performance considerations align with UI benchmarks (virtual scroll, workers).
|
||||
- [ ] Offline behaviour captured for catalog, overlays, exports.
|
||||
- [ ] Screenshot placeholders and coordination notes recorded with Console Guild follow-up.
|
||||
- [ ] All referenced docs verified and accessible.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-26 (Sprint 23).*
|
||||
15
docs/updates/2025-10-26-authority-graph-scopes.md
Normal file
15
docs/updates/2025-10-26-authority-graph-scopes.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# 2025-10-26 — Authority graph scopes documentation refresh
|
||||
|
||||
## Summary
|
||||
|
||||
- Documented least-privilege guidance for the new `graph:*` scopes in `docs/11_AUTHORITY.md` (scope mapping, tenant propagation, and DPoP expectations).
|
||||
- Extended the sample client table/config to include Cartographer and Graph API registrations so downstream teams can copy/paste the correct defaults.
|
||||
- Highlighted the requirement to consume `StellaOpsScopes` constants instead of hard-coded scope strings across services.
|
||||
|
||||
## Next steps
|
||||
|
||||
| Team | Follow-up | Target |
|
||||
|------|-----------|--------|
|
||||
| Authority Core | Ensure `/jwks` changelog references graph scope rollout in next release note. | 2025-10-28 |
|
||||
| Graph API Guild | Update gateway scaffolding to request scopes from `StellaOpsScopes` once the host project lands. | Sprint 21 stand-up |
|
||||
| Scheduler Guild | Confirm Cartographer client onboarding uses the new sample secret templates. | Sprint 21 stand-up |
|
||||
34
docs/updates/2025-10-26-scheduler-graph-jobs.md
Normal file
34
docs/updates/2025-10-26-scheduler-graph-jobs.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# 2025-10-26 — Scheduler Graph Job DTOs ready for integration
|
||||
|
||||
## Summary
|
||||
|
||||
SCHED-MODELS-21-001 delivered the new `GraphBuildJob`/`GraphOverlayJob` contracts and SCHED-MODELS-21-002 publishes the accompanying documentation + samples for downstream teams.
|
||||
|
||||
Key links:
|
||||
|
||||
- Schema doc: `src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-21-001-GRAPH-JOBS.md`
|
||||
- Samples (round-trip tested): `samples/api/scheduler/graph-build-job.json`, `samples/api/scheduler/graph-overlay-job.json`
|
||||
- Event schema + sample: `docs/events/scheduler.graph.job.completed@1.json`, `docs/events/samples/scheduler.graph.job.completed@1.sample.json`
|
||||
- API doc: `src/StellaOps.Scheduler.WebService/docs/SCHED-WEB-21-001-GRAPH-APIS.md`
|
||||
- Tests: `StellaOps.Scheduler.Models.Tests/SamplePayloadTests.cs`, `GraphJobStateMachineTests.cs`
|
||||
|
||||
## Action items
|
||||
|
||||
| Guild | Request | Owner | Target |
|
||||
| --- | --- | --- | --- |
|
||||
| Scheduler WebService | Wire DTOs into upcoming `/graphs` job APIs (SCHED-WEB-21-001/002). | Scheduler Models Guild | Sprint 21 stand-up |
|
||||
| Scheduler Worker | Align planners/executors with `GraphJobStateMachine` and new metadata fields. | Scheduler Models Guild | Sprint 21 stand-up |
|
||||
| Cartographer | Confirm expectations for `graphSnapshotId`, `cartographerJobId`, overlay triggers. | Scheduler Models Guild | Cartographer sync 2025-10-27 |
|
||||
|
||||
### Notification log
|
||||
|
||||
- 2025-10-26 — Posted summary + action items to `#scheduler-guild` and `#cartographer-guild` using the snippet below. Both messages linked back to the schema doc and event sample for follow-up.
|
||||
- 2025-10-26 — Shared the API doc link with WebService guild thread for endpoint contract review before Cartographer wiring. Highlighted new `POST /graphs/hooks/completed` + `GET /graphs/overlays/lag` behaviour and correlation IDs.
|
||||
|
||||
> Suggested message for Slack `#scheduler-guild` & `#cartographer-guild`:
|
||||
>
|
||||
> ```
|
||||
> Graph job DTOs/docs are live (SCHED-MODELS-21-001/002). Samples under samples/api/scheduler, schema notes in src/StellaOps.Scheduler.Models/docs/SCHED-MODELS-21-001-GRAPH-JOBS.md. Please review before wiring SCHED-WEB-21-001/201. GraphJobStateMachine enforces status/attempt invariants—shout if you need additional states.
|
||||
> ```
|
||||
|
||||
Record notifications here once posted.
|
||||
Reference in New Issue
Block a user