# Tenant audit & chaos kit (DEVOPS-TEN-49-001) Artifacts live in this folder to cover tenant audit logging, usage metrics, JWKS outage chaos, and load/perf benchmarks. ## What’s here - `recording-rules.yaml` – Prometheus recordings for per-tenant rate/error/latency and JWKS cache ratio. - `alerts.yaml` – Alert rules for error rate, JWKS cache miss spike, p95 latency, auth failures, and rate limit hits. - `dashboards/tenant-audit.json` – Grafana dashboard with tenant/service variables. - `k6-tenant-load.js` – Multi-tenant load/perf scenario (read/write 90/10, tenant header, configurable paths). - `jwks-chaos.sh` – iptables-based JWKS dropper for chaos drills. ## Import & wiring 1. Load `recording-rules.yaml` and `alerts.yaml` into the Prometheus rule groups for the tenancy stack. 2. Import `dashboards/tenant-audit.json` into Grafana (folder `StellaOps / Tenancy`). 3. Ensure services emit `tenant` labels on request metrics and structured logs (`tenant`, `subject`, `action`, `resource`, `result`, `traceId`). ## Load/perf (k6) ```bash BASE_URL=https://api.stella.local \ TENANTS=tenant-a,tenant-b,tenant-c \ TENANT_HEADER=X-StellaOps-Tenant \ VUS=5000 DURATION=15m \ k6 run ops/devops/tenant/k6-tenant-load.js ``` Adjust `TENANT_READ_PATHS` / `TENANT_WRITE_PATHS` to point at Policy/Vuln/Notify endpoints. Default thresholds: p95 <300ms (read), <600ms (write), error rate <0.5%. ## JWKS chaos drill ```bash JWKS_HOST=authority.local JWKS_PORT=8440 DURATION=300 \ ./ops/devops/tenant/jwks-chaos.sh & BASE_URL=https://api.stella.local TENANTS=tenant-a,tenant-b \ k6 run ops/devops/tenant/k6-tenant-load.js ``` Run on an isolated agent with sudo/iptables available. Watch `jwks_cache_hit_ratio:5m`, `tenant_error_rate:5m`, and alerts `jwks_cache_miss_spike` / `tenant_auth_failures_spike`.