Files
git.stella-ops.org/ops/devops/tenant
StellaOps Bot 35c8f9216f Add tests and implement timeline ingestion options with NATS and Redis subscribers
- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality.
- Created `PackRunWorkerOptions` for configuring worker paths and execution persistence.
- Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports.
- Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events.
- Developed `RedisTimelineEventSubscriber` for reading from Redis Streams.
- Added `TimelineEnvelopeParser` to normalize incoming event envelopes.
- Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping.
- Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
2025-12-03 09:46:48 +02:00
..

Tenant audit & chaos kit (DEVOPS-TEN-49-001)

Artifacts live in this folder to cover tenant audit logging, usage metrics, JWKS outage chaos, and load/perf benchmarks.

Whats here

  • recording-rules.yaml Prometheus recordings for per-tenant rate/error/latency and JWKS cache ratio.
  • alerts.yaml Alert rules for error rate, JWKS cache miss spike, p95 latency, auth failures, and rate limit hits.
  • dashboards/tenant-audit.json Grafana dashboard with tenant/service variables.
  • k6-tenant-load.js Multi-tenant load/perf scenario (read/write 90/10, tenant header, configurable paths).
  • jwks-chaos.sh iptables-based JWKS dropper for chaos drills.

Import & wiring

  1. Load recording-rules.yaml and alerts.yaml into the Prometheus rule groups for the tenancy stack.
  2. Import dashboards/tenant-audit.json into Grafana (folder StellaOps / Tenancy).
  3. Ensure services emit tenant labels on request metrics and structured logs (tenant, subject, action, resource, result, traceId).

Load/perf (k6)

BASE_URL=https://api.stella.local \
TENANTS=tenant-a,tenant-b,tenant-c \
TENANT_HEADER=X-StellaOps-Tenant \
VUS=5000 DURATION=15m \
k6 run ops/devops/tenant/k6-tenant-load.js

Adjust TENANT_READ_PATHS / TENANT_WRITE_PATHS to point at Policy/Vuln/Notify endpoints. Default thresholds: p95 <300ms (read), <600ms (write), error rate <0.5%.

JWKS chaos drill

JWKS_HOST=authority.local JWKS_PORT=8440 DURATION=300 \
./ops/devops/tenant/jwks-chaos.sh &
BASE_URL=https://api.stella.local TENANTS=tenant-a,tenant-b \
k6 run ops/devops/tenant/k6-tenant-load.js

Run on an isolated agent with sudo/iptables available. Watch jwks_cache_hit_ratio:5m, tenant_error_rate:5m, and alerts jwks_cache_miss_spike / tenant_auth_failures_spike.