- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality. - Created `PackRunWorkerOptions` for configuring worker paths and execution persistence. - Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports. - Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events. - Developed `RedisTimelineEventSubscriber` for reading from Redis Streams. - Added `TimelineEnvelopeParser` to normalize incoming event envelopes. - Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping. - Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
35 lines
1.8 KiB
Markdown
35 lines
1.8 KiB
Markdown
# Tenant audit & chaos kit (DEVOPS-TEN-49-001)
|
||
|
||
Artifacts live in this folder to cover tenant audit logging, usage metrics, JWKS outage chaos, and load/perf benchmarks.
|
||
|
||
## What’s here
|
||
- `recording-rules.yaml` – Prometheus recordings for per-tenant rate/error/latency and JWKS cache ratio.
|
||
- `alerts.yaml` – Alert rules for error rate, JWKS cache miss spike, p95 latency, auth failures, and rate limit hits.
|
||
- `dashboards/tenant-audit.json` – Grafana dashboard with tenant/service variables.
|
||
- `k6-tenant-load.js` – Multi-tenant load/perf scenario (read/write 90/10, tenant header, configurable paths).
|
||
- `jwks-chaos.sh` – iptables-based JWKS dropper for chaos drills.
|
||
|
||
## Import & wiring
|
||
1. Load `recording-rules.yaml` and `alerts.yaml` into the Prometheus rule groups for the tenancy stack.
|
||
2. Import `dashboards/tenant-audit.json` into Grafana (folder `StellaOps / Tenancy`).
|
||
3. Ensure services emit `tenant` labels on request metrics and structured logs (`tenant`, `subject`, `action`, `resource`, `result`, `traceId`).
|
||
|
||
## Load/perf (k6)
|
||
```bash
|
||
BASE_URL=https://api.stella.local \
|
||
TENANTS=tenant-a,tenant-b,tenant-c \
|
||
TENANT_HEADER=X-StellaOps-Tenant \
|
||
VUS=5000 DURATION=15m \
|
||
k6 run ops/devops/tenant/k6-tenant-load.js
|
||
```
|
||
Adjust `TENANT_READ_PATHS` / `TENANT_WRITE_PATHS` to point at Policy/Vuln/Notify endpoints. Default thresholds: p95 <300ms (read), <600ms (write), error rate <0.5%.
|
||
|
||
## JWKS chaos drill
|
||
```bash
|
||
JWKS_HOST=authority.local JWKS_PORT=8440 DURATION=300 \
|
||
./ops/devops/tenant/jwks-chaos.sh &
|
||
BASE_URL=https://api.stella.local TENANTS=tenant-a,tenant-b \
|
||
k6 run ops/devops/tenant/k6-tenant-load.js
|
||
```
|
||
Run on an isolated agent with sudo/iptables available. Watch `jwks_cache_hit_ratio:5m`, `tenant_error_rate:5m`, and alerts `jwks_cache_miss_spike` / `tenant_auth_failures_spike`.
|