Add tests and implement timeline ingestion options with NATS and Redis subscribers

- Introduced `BinaryReachabilityLifterTests` to validate binary lifting functionality.
- Created `PackRunWorkerOptions` for configuring worker paths and execution persistence.
- Added `TimelineIngestionOptions` for configuring NATS and Redis ingestion transports.
- Implemented `NatsTimelineEventSubscriber` for subscribing to NATS events.
- Developed `RedisTimelineEventSubscriber` for reading from Redis Streams.
- Added `TimelineEnvelopeParser` to normalize incoming event envelopes.
- Created unit tests for `TimelineEnvelopeParser` to ensure correct field mapping.
- Implemented `TimelineAuthorizationAuditSink` for logging authorization outcomes.
This commit is contained in:
StellaOps Bot
2025-12-03 09:46:48 +02:00
parent e923880694
commit 35c8f9216f
520 changed files with 4416 additions and 31492 deletions

View File

@@ -25,12 +25,15 @@ Scope: deploy audit pipeline, capture tenant usage metrics, run JWKS outage chao
- Multi-tenant spread: at least 10 tenants, randomised per VU; ensure metrics maintain `tenant` label cardinality cap (<= 1000 active tenants).
## Implementation steps
- Add dashboards (Grafana folder `StellaOps / Tenancy`) with panels for per-tenant latency, error rate, rate-limit hits, JWKS cache hit rate.
- Alert rules: `tenant_error_rate_gt_0_5pct`, `jwks_cache_miss_spike`, `tenant_rate_limit_exceeded`.
- CI: add chaos test job stub (uses docker-compose + iptables fault) gated behind manual approval.
- Docs: update `deploy/README.md` Tenancy section once dashboards/alerts live.
- Add dashboards (Grafana folder `StellaOps / Tenancy`) with panels for per-tenant latency, error rate, rate-limit hits, JWKS cache hit rate, auth failures.
- Alert rules: `tenant_error_rate_gt_0_5pct`, `jwks_cache_miss_spike`, `tenant_rate_limit_exceeded`, `tenant_latency_p95_high`, `tenant_auth_failures_spike` with supporting recording rules in `recording-rules.yaml`.
- Load/perf: k6 scenario `k6-tenant-load.js` (read/write 90/10, random tenants, headers configurable) targeting 5k RPS.
- Chaos: reusable script `jwks-chaos.sh` + CI stub in `README.md` describing manual-gated run to drop JWKS egress while k6 runs.
- Docs: update `deploy/README.md` Tenancy section once dashboards/alerts live. Status: added Tenancy Observability section with import steps.
## Artefacts
- Dashboard JSON: `ops/devops/tenant/dashboards/tenant-audit.json`
- Alert rules: `ops/devops/tenant/alerts.yaml`
- Recording rules: `ops/devops/tenant/recording-rules.yaml`
- Load/perf harness: `ops/devops/tenant/k6-tenant-load.js`
- Chaos script: `ops/devops/tenant/jwks-chaos.sh`