Add tests and implement timeline ingestion options with NATS and Redis subscribers

- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality.
- Created `PackRunWorkerOptions` for configuring worker paths and execution persistence.
- Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports.
- Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events.
- Developed `RedisTimelineEventSubscriber` for reading from Redis Streams.
- Added `TimelineEnvelopeParser` to normalize incoming event envelopes.
- Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping.
- Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
This commit is contained in:
StellaOps Bot
2025-12-03 09:46:48 +02:00
parent e923880694
commit 35c8f9216f
520 changed files with 4416 additions and 31492 deletions

View File

@@ -0,0 +1,34 @@
# Tenant audit & chaos kit (DEVOPS-TEN-49-001)
Artifacts live in this folder to cover tenant audit logging, usage metrics, JWKS outage chaos, and load/perf benchmarks.
## Whats here
- `recording-rules.yaml` Prometheus recordings for per-tenant rate/error/latency and JWKS cache ratio.
- `alerts.yaml` Alert rules for error rate, JWKS cache miss spike, p95 latency, auth failures, and rate limit hits.
- `dashboards/tenant-audit.json` Grafana dashboard with tenant/service variables.
- `k6-tenant-load.js` Multi-tenant load/perf scenario (read/write 90/10, tenant header, configurable paths).
- `jwks-chaos.sh` iptables-based JWKS dropper for chaos drills.
## Import & wiring
1. Load `recording-rules.yaml` and `alerts.yaml` into the Prometheus rule groups for the tenancy stack.
2. Import `dashboards/tenant-audit.json` into Grafana (folder `StellaOps / Tenancy`).
3. Ensure services emit `tenant` labels on request metrics and structured logs (`tenant`, `subject`, `action`, `resource`, `result`, `traceId`).
## Load/perf (k6)
```bash
BASE_URL=https://api.stella.local \
TENANTS=tenant-a,tenant-b,tenant-c \
TENANT_HEADER=X-StellaOps-Tenant \
VUS=5000 DURATION=15m \
k6 run ops/devops/tenant/k6-tenant-load.js
```
Adjust `TENANT_READ_PATHS` / `TENANT_WRITE_PATHS` to point at Policy/Vuln/Notify endpoints. Default thresholds: p95 <300ms (read), <600ms (write), error rate <0.5%.
## JWKS chaos drill
```bash
JWKS_HOST=authority.local JWKS_PORT=8440 DURATION=300 \
./ops/devops/tenant/jwks-chaos.sh &
BASE_URL=https://api.stella.local TENANTS=tenant-a,tenant-b \
k6 run ops/devops/tenant/k6-tenant-load.js
```
Run on an isolated agent with sudo/iptables available. Watch `jwks_cache_hit_ratio:5m`, `tenant_error_rate:5m`, and alerts `jwks_cache_miss_spike` / `tenant_auth_failures_spike`.