Files
StellaOps Bot 35c8f9216f Add tests and implement timeline ingestion options with NATS and Redis subscribers
- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality.
- Created `PackRunWorkerOptions` for configuring worker paths and execution persistence.
- Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports.
- Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events.
- Developed `RedisTimelineEventSubscriber` for reading from Redis Streams.
- Added `TimelineEnvelopeParser` to normalize incoming event envelopes.
- Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping.
- Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
2025-12-03 09:46:48 +02:00

35 lines
1.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Tenant audit & chaos kit (DEVOPS-TEN-49-001)
Artifacts live in this folder to cover tenant audit logging, usage metrics, JWKS outage chaos, and load/perf benchmarks.
## Whats here
- `recording-rules.yaml` Prometheus recordings for per-tenant rate/error/latency and JWKS cache ratio.
- `alerts.yaml` Alert rules for error rate, JWKS cache miss spike, p95 latency, auth failures, and rate limit hits.
- `dashboards/tenant-audit.json` Grafana dashboard with tenant/service variables.
- `k6-tenant-load.js` Multi-tenant load/perf scenario (read/write 90/10, tenant header, configurable paths).
- `jwks-chaos.sh` iptables-based JWKS dropper for chaos drills.
## Import & wiring
1. Load `recording-rules.yaml` and `alerts.yaml` into the Prometheus rule groups for the tenancy stack.
2. Import `dashboards/tenant-audit.json` into Grafana (folder `StellaOps / Tenancy`).
3. Ensure services emit `tenant` labels on request metrics and structured logs (`tenant`, `subject`, `action`, `resource`, `result`, `traceId`).
## Load/perf (k6)
```bash
BASE_URL=https://api.stella.local \
TENANTS=tenant-a,tenant-b,tenant-c \
TENANT_HEADER=X-StellaOps-Tenant \
VUS=5000 DURATION=15m \
k6 run ops/devops/tenant/k6-tenant-load.js
```
Adjust `TENANT_READ_PATHS` / `TENANT_WRITE_PATHS` to point at Policy/Vuln/Notify endpoints. Default thresholds: p95 <300ms (read), <600ms (write), error rate <0.5%.
## JWKS chaos drill
```bash
JWKS_HOST=authority.local JWKS_PORT=8440 DURATION=300 \
./ops/devops/tenant/jwks-chaos.sh &
BASE_URL=https://api.stella.local TENANTS=tenant-a,tenant-b \
k6 run ops/devops/tenant/k6-tenant-load.js
```
Run on an isolated agent with sudo/iptables available. Watch `jwks_cache_hit_ratio:5m`, `tenant_error_rate:5m`, and alerts `jwks_cache_miss_spike` / `tenant_auth_failures_spike`.